It is a radeon/geforce competitor. Or something like that.
The cell processor is only really fast when the spus are in use, which means 32-bit non-branching floating-point arithmatic. For anything involving integer math, flow control, or uneven memory access, the SPUs defer to the main processor. I'm sure IBM put a decent processor in there, but it doesn't sound like it's anything revolutionary, and there's only the one.
What does this get you? -- A processor that is really good at decoding mpeg, rendering graphics, maybe approximating the physics of flying dragons. It is not a fast general purpose processor. Operating systems, word processors, databases, these are all integer tasks, and much more-so they are branch tasks. Scientific computation - this requires double-precision floating point. Photoshop is about the only piece of non-multimedia software that might be able to take advantage of this.
The end result is that this will likely be a great chip for set-top boxes of all sorts, maybe even for video-editing workstations. A G5/pentium replacement it isn't; that's a different ball game.
No it's not at all revolutionary, but you stumbled upon one of the rare bits of insight in the world of computers: NOTHING NEW EVER EVER EVER HAPPENS IN COMPUTERS!
We figured out the basics for how to build a computer by about 1970, and since then there have really only been about three discoveries per decade. Most of those aren't even technology discoveries, but economic or marketing break-throughs. Most of computer engineering is taking old ideas, and incrementing them, or packaging them together in different ratios, or in new combinations.
Is the cell architecture new? No! It's 8 DSP's chained to a special I/O port on a powerpc processor. Each has a dedicated block of sram, and can mark dirty pages in the powerpc's main memory. Actually they probably can't, I bet the master processor does that. Will this thing be really fast for decoding mpeg streams, yeah probably it will. Will it be fast for general purpose supercomputing? - Absolutely not - supercomputing is memory limited, not ALU limited. Will it revolutionize servers or desktops? - absolutely not, they are limited by branches, and unstructured memory access, not by computation ability.
The cell is a really powerful brute-force solution to a problem that not many people have. It's probably a great fit for ps3, and not for a lot else. If it were, you'd see IBM promoting it in other product lines. They aren't, so it's obviously not that usefull outside of embeded dsp-style applications.
This is all extremely plausible. remember: these are not 8 full function ALUs like you get in a modern microprocessor. They opperate on 32-bit single-precision floats only. For all we know they are each in-order long pipelines, with static branch prediction, no speculative execution, a limited distributed memory scheme etc.etc.etc.
It's really 8 digital signal processors attached to a powerpc chip. You can do that today, you just run the data over the PCI bus, rather than on a big in-chip bus. IBM is just gonna stuff all that onto a chip, and hope that 65nm process will make if fast/cool/cheap enough to put in a console.
It has always been possible to make very fast parallel architectures if you make them very special-purpose. Cell sounds like it will be very difficult to program, and only a couple of game-engine designers will bother trying. The rest will just license those engines and work on game-play, story-line, and art. Same as before, only faster.
I edit digital photos, and like to keep multiple versions along the editing process. I have about 300GB in my working pile, and 2TB in my archive. I use a 100GB LTO tape drive, and it takes 20 tapes to get a single backup. Since I want to keep one set of tapes at my house, one set at the bank, and one set in transit, that's 60 cartridges. I would be very glad to have at least a 1TB tape solution for under 5 grand.
I can't begin to imagine what the high-def video people do. Yes you can store a lot of 320x280 mpeg files on a cheap disk. Not so with HD stuff. it's like 200GB per HOUR of footage.
computers are becoming consumer electronics devices. They're being sold by the same people and to the same people shopping for TVs and microwaves. Yes it's more fun to watch your crappy football team on a 42 inch plasma panel, but the bulk of TVs sold are still 24 inch crts for under $300. The bulk of microwaves cost under $50, and the most common stereos are still bookshelf/boombox systems that cost under $100.
Technodorks are the minority. $500 is STILL far too expensive for a lot of people. For a lot of people the choice is often between buying a computer, or eating. Thus we have a digital divide where the poor in this country can't work a computer, and thus have fewer chances to get decent jobs, thus their children also won't have access to technology, thus...
Even for those with the means, it's a comparison of value. What do you get out of your two thousand dollar laptop? How does it improve your life? Is that worth more than flying your whole family to visit your parents for christmas? Is that worth more than a whole new wardrobe and a lot of dinners and movie tickets? I'm reluctant to endorse anything walmart does considering how they treat their employees and what they do to small communities, however, I'm glad that this increases technology access to people who might not be otherwise able to afford it.
Strained silicon is a great technology. you get 30% (or whatever) better electron mobility, which makes for faster capaciter discharge, and thus faster transister switching, and reduced heat generated in the process. However, you can't strain it much more than they already have. It bought the lithography folks another few hundred megahertz, but it's not going to keep moore's law alive for another couple decades, at least not by itself.
Strained silicon doesn't really address the two big problems facing silicon lithography: leakage current, and the ever rising costs of dynamic power costs. Even with strained silicon there are still hundreds of millions of capacitors, each charging and dischanrging billions of times a second. If the frequency increases by some number X and the number of caps increases by some number Y, you have to drop the charge on each cap by X*Y or the dynamic power usage goes up. Furthermore, leakage current, which used to contribute almost nothing to the energy needs of a CPU, now makes up a good percent of the electrical and heat budgets. The drains are just too close to the body. There are too few atoms of semiconductor to act as a resistor.
It's a nice one-time speed bump, but it does solve the hard problems, just puts them off for another year.
No. They saw it coming. There just wasn't much of anything to do about it. Even IBM and HP are unprofitable because of dell and sony.
How is SGI going to compete on their volume? not at all. Workstations just aren't a big enough market. They tried, remember the visual workstations (SGI, xeons + WinNT) I don't think there's anything they could have done. Lost cause.
What's the #1 most sought after market in america as well as Japan? Teenagers. They don't make the most money, but it's all disposable. A few product segments sell exclusively to adults: cars, mortgages, dishwashers. However, the lifestyle products sell to kids who either spend their own money on the item, or talk their parents into buying it. Market research shows that the average teenager controls or directs almost $300 per week of spending. (lies, damn lies, and statistics)
In Japan they just have a lot more 33 year old teenagers running around.
Absolutely. Just like flat panels first took off in wall street brokerage firms. People in New York are willing to spend more on things that are small becuase everyone has a 500ft2 apartment. In japan they have 400ft2 so they'll spend even more on small crap.
For the rest of our fat-ass country we don't care about small tech toys. We already have 4000 ft2 houses to store all of our ass-fat. Besides, we're busy spending all our money on fatass cars to drag our ass-fat to the aktins approved fatorium to optimize our fatilization.
The US is about big, not about sophisticated. We have the biggest cars on the longest highways, with the biggest heated seats. 20ounce steaks, 2 for $8 if you catch the earlybird with a side of coleslaw and a porkchop.
Tru64 definately is a better Unix. However, it has never been very successful in the commercial marketplace. Switching over to a Tru64 system would completely alienate the hp-ux customers, and there are MORE hp-ux customers.
HP-UX is an old relic, (seriously. working in the HP kernel is like looking at ATT unix from the mid 80's), but it works. It has the virtualization features one might expect from a high-end unix, and a lot of software support. It doesn't preform particularly quickly, and it's kinda obscure and clunky. What it really lacks is a mature 3rd generation filesystem, which is why it comes bundled with Vxfs.
Dec's AdvFS is not really any better than Veritas, except that it's so nicely integrated with Truclusters. I don't know how well Veritas' clustered filesystem works, but it runs on solaris and linux. Thus you can run both linux and hp-ux on vpars within the same hp server, and share data. Though I really liked trucluster/cfs, it would only be really helpful if they ported to both linux and hp-ux.
Appart from making the Tru64 -> HP-UX transition harder, I don't see that they lose any features by picking veritas over CFS. It just seems like hiring a few more engineers would have been cheaper than playing this back-and-forth game with marketing.
last time I compiled FreeBSD all the way up, it was on a 200mhz PPro. Yes it takes a while, but on the order of hit-the-button-the go out the the bar, not hit the button-wait three paychecks.
I have to imagine the integer units on an ipaq are at least that of my old ppro, as must be the memory. The drive is probably slower though.
Considering how tech savy the author seems to be, it's interesting that he doesn't understand what an MPP is. The T3d IS an mpp, made in response to a wave of mpp designs in the late 80's taking some of cray's market share (thinking machines, paragon, etc) MPP, incidently, stands for Massively Parallel Processing; massively as in hundreds, not 32.
The T90, on the other hand, is a pure SMP. The processors all sit on a shared bus (actually 256 parallel shared buses). Each CPU was really fast (for the time) and had really big pipes to memory, and really expensive.
The sad thing about Nuclear (or wind, solar, etc) alternatives is that they only can work as alternative to electrical grid power. Much more than half of the oil used in this country powers car, trains, trucks, and planes. Another big chunk goes to providing actual heat for buildings and industrial processes.
Since the efficiency of electrical power generation is less than 50%, heat generation will almost certaintly never use electrical power. If we can get fuel cell systems to work well with hydrogen, and can efficiently use electricity to isolate hydrogen from water, then MAYBE cars and trucks could benefit from this. Jets will almost certaintly remain oil-fueled.
Thus you might be able to impact 40%-50% of the energy needs of the nation. Though obviously it doesn't really matter where you save the oil, just so long as the total consumption goes down.
The big spoilers in all this are India and China, whose energy needs are going to increase constantly over the next several decades. It doesn't matter who is the president, we're gonna have to compete for the oil, whoever controls it. Gas prices are only going to go UP.
Linux has been EXTREEMLY successful at breaking into the supercomputer market. Particularly in the 40-200 dual SMP node size. Some really large systems run linux, but not as large a majority as is the case with the smaller systems.
Probably the two most successful supercomputing product lines are the IBM p-series clusters and HP's alpha-server SC line. Sun sells clusters of sunfire 6800's and up.
The IBM uses 1-32 way SMPs running on POWER[3-5] processors and running AIX. They use a proprietary interconnect switch called the SP switch, the most recent incarnation of which is called the Federation switch. The SP switch is faster than the commodity cluster interconnects, but not when you divide it by the number of CPUs in the big SMP boxes. One of the main strengths of the IBM systems is the absolutely HUGE I/O bandwidth of the large systems. I've seen real-world: 15 gigaBYTES per second to a filesystem. The product line has been around for a decade, and is well understood, and a good performer.
The HP (formerly compaq, formerly DEC) alpha server uses smaller SMPs for the SC line. Each node is a 4-way SMP running Tru64 (i.e. OSF/1) Unix with quadrix as the interconnect. These systems used to totally trounce any other cluster product, but are getting long in the tooth. HP stopped development of new systems about 4 years ago, and recently released the last CPU rev for these systems. They are a little reluctant, however, to publish performance number, as the 1.3ghz alpha processors still get performance about the same as HP's newer Itanium2 based systems. Tru64 Unix, while never commercially successful performs very well, and is quite suited to this application. The TruClusters software and the integrated cluster filesystem make systems administration on these big systems much simpler than linux clusters. Currently the #3 fastest super in the world is an SC. Since this is the end of the product line, they will all probably disappear.
HP also sells some of their HP-UX based systems, which now use itanium2 processors, to supercomputing customers. Though they make 128-way SMPs, they sell those as super-servers, not for hptc customers. Instead they stick with the 2 and 4 way nodes with infinaband or myrinet. Most of these are sold with linux, but some use hp-ux. I can't see any real advantage to using hp-ux in that sort of environment, and apparently no one else can see one either.
Sun sells clusters of their big SMP boxes into the hptc market. These clusters run solaris and use Sun's proprietary sunfire-cluster interconnect. The interconnect is REALLY fast, but maxes out at 4 nodes. Thus you can string together 4 72-way (actually 144-core) SMPs for a 288cpu (576) cluster. These aren't the fastest CPUs on the planet, but aren't half bad. The interconnect is good, and the software availability is really good. I think most of these aren't sold as PURE compute clusters. Rather they are likely some hybrid of super-compute and server-type operations.
Fujitsu sells linux/itanium boxes, but also has a primepower cluster which uses their solaris based sparc64 SMPs clustered using commodity interconnects. As with the sun solution, they provide a host of software options, decent CPUs, and a fairly hefty price tag when compared with commodity clusters.
SGI also still sells their legacy origin (MIPS cpu / Irix os) systems. Since they max out at 1024 CPUs per system image, the really large systems are actually clusters of origins. Irix scales really well to a lot of CPUs, but each of the processors is just not very fast. The system is interconnected with craylink.
That really describes the bulk of the cluster products. As for non-cluster supercomputers ----- Asci-red runs a proprietary microkernel OS called PUMA. Cray XT3 runs catamount. Cray T3e runs unicos/mk (ChorusOS). Cray X1 runs unicos/MP (irix). NEC SX-series run a system-5 derivative unix called Super-UX. Hitachi SR8000 uses HI-MX (Mach).
Dig around on top500.org and you can probably find out more.
At NASA sgi has been experimenting with 2048 proc single system image. Since the japan system has yet to be deployed, it will likely be a single system.
The SGI magic memory controller incorperates the numalink (origionally called cray-link) router they leveraged from the T3e work. This router uses worm-hole routing, which starts forewarding a packet as soon as the address bytes are read. This means that the added latency of going through several routers is often much less than packaging up the packet in the first place. On the hardware side of things it's not the number of router-hops that limits the scalability of the system. Rather the greater the size of the memory, the coarser the size of the directory blocks. With 13TB of memory you are probably invalidating dozens or hundreds of pages at a time. SUCK.
The cache coherency of SGI's cc-numa machines makes them increadibly easy to program. However, there is a big overhead. Since most supercomputing software is written with MPI, rather than with posix-threads, you don't really behefit from it anyway. I think you can disable the hardware coherency on a per-process basis, which would greatly speed up MPI software.
Commodity linux clusters are not the only kind of cluster out there. SGI has been building clusters since the late 80s. Their first super-computer product, the power-challenge clusters, were 16 and 36 way SMP boxes clustered together with hippi. Remember terminator-2 and jurasic park? Those were rendered on clusters of crimsons and indigo workstations. They may have called the NOW(network of workstations) instead of beowulf, but it was the same thing.
As for linux, they stepped towards linux about the same time IBM, HP, and Oracle did. They've contributed a LOT of code to linux and GPL products. They have transitioned the bulk of their product-line to linux in the last year or so, but they started that process five years ago. They have a LOT of legacy customers and legacy code to transition. Linux is a stable and high performance OS, and it would be that way without SGI, but it got there a lot faster because of SGI's efforts.
Furthermore, SGI doesn't give a damn (nor does anyone else) if slashdot loves them or not. They care if nasa, boeing, the US navy, BP, and NBC love them. These are the people with the bucks, more interested in a solution to a problem than to any license or technology.
The real reason that SGI doesn't get the credit they should is much simpler: They put a crappy scsi controller on the mezanine-bus of the challenge-S server in 1994. In the early 90s SGI was the darling of the multi-media world. Their workstations were everywhere, and they made pretty cool servers too. They were poised to ride the same dot-com wave that SUN rode. They introduced a single-CPU server called the challenge-S, which was derived from the indy workstation. It was reasonably speedy and quite affordable (for a unix server of the time). The scsi controller, however, was quite prone to failure. They developed a bad reputation. While the world was busy buying sun servers hand-over-fist, they avoided SGIs except in the technical/defense/media markets. That legacy shaped the company into what it is today: a niche player, struggling against giants like IBM and HP, in the relatively small market for high performance computers.
SGI did the design work, but they wouldn't be there now without nasa in the past. For the last 10 years, at least, Nasa ames has had a dozen engineers who might as well be called sgi employees.
Nasa has been a strong supporter of SGI for a decade or more. They were the biggest customer of the challege-XL power-clusters back in the early 90s. They installed the first 128proc O2000s. They installed one of the first 256 proc O3000s. The NAS group there was instrumental in getting cxfs and dmf product-ready. They've bank-rolled a lot of sgi development, and been a test-bed for sgi's top-end, bleeding-edge systems many times.
I don't mean to disparage the altix, it's a great platform. Nasa, however, does not just buy shrink-wrapped purple boxes. They raise the bar, and vendors try to clear it.
Re:The math for a comparable Xserve system
on
Cray XT-3 Ships
·
· Score: 0
you're right that most estimates ignore labor, and discounts, and are completely unrealistic. Virginia Tech has the advantage that the Big Mac is just a technology demonstation, and doesn't actually do any real work.
Xsan, is apple's san filesystem. $1000 per node + $500 for a dual port FC card. A bargain by anyone's standard. Most big clusters, however, don't use cluster filesystems. I don't understand why not, it seems like a great idea. But they don't.
Octiga bay is certaintly an odd buy, but it doesn't seem to get in the way of the big guns. It really is more of a linux networx sort of product, with a similar level of sophistication. They can keep the division largely seperate and be happy, I would think. The cray name adds credibility to the product, and it addresses a market segment that isn't served by the current cray offerings. Obviously the institutional investors agreed with you, and dropped cray stock 20% after the acquisition. I definately agree that it's a bit of a black sheep, but maybe that's okay.
I think the X1 is an amazing machine. However I don't think it's true that X1 is only competing with NEC. Most big codes have been ported from vector archs to MPP or cluster code. If you read the comparisons Oak Ridge did, they are comparing X1 to altix and IBM p-series. The X1 does very well compared to scalar systems, but they also have to compete on cost. They need to keep up a critical mass of sales to pay for the custom engineering.
They also need to do something about the terrible, terrible scalar performance of the MIPS-derived X1 cpus. While it would be cool to have a mix-match system with some vector boards and some scalar boards (presumably the vector boards have multiple parallel interconnect networks), I'd rather see an X1 with an opteron as the scalar processor. (Not that this seems reasonable, as the scalars in the X1 can access the vector registers in 3 cycles, or something like that. totally impossible even on hypertransport)
Pre-SGI cray needed help, badly. They just had too many product lines for a post-cold-war supercomputing company.
They had the C90-to-T90 transition, which they were doing almost completely by themselves. Even IBM had given up on ECL. T90 was fast, but it was really really expensive. 32 processors sharing a flat memory is just too daunting, even without caches.
The J90 was profitable in the end, but took a lot of capital during the design work. Furthermore it detracted from the T90's transition to IEEE fp.
The T3D was a really cool product, but the engineering costs were huge.
The 6400 was a really great product, FOR SUN microsystems. Why did cray think they could sell a database server better than IBM or sun could?
It's not that they didn't have a pot to piss in, it's that they tried to piss in 25 pots all at once. Lo and behold, they got their feet wet. Shock and amazement.
It's too bad that sgi didn't use the cray division very well. They had a really good opportunity to blend some amazing technologies. Cray had the old, wool-suit engineers who had tackled hard problems for a long-long time. SGI had the wild breed of california technologist from the early RISC days. The sum of the parts was huge. Alas the combined whole was a disaster.
Cray today looks like they are doing cool things. xt3 is pretty neat. I'm curious to see how they combine it with the x1 vector stuff. Hopefully their three current product lines can share a lot of technology in future revisions. They just need to avoid getting too big for their britches. The supercomputing market is just not big enough to support anything too massive. Especially when you have to go head-to-head with IBM.
There are thousands of compute nodes, all of which get i/o services from dozens, or hundreds of i/o nodes. These i/o nodes run linux, several instances of linux. Basically the i/o nodes ARE a cluster, though not a compute cluster, and not necessarily a symmetric cluster. The i/o nodes run lustre in very much the same way that a cluster system would (though they can take advantage of hardware features not present on commodity clusters).
The real difference in this system is that nodes are not peers, as they are in commodity clustes. Each node has only one function, and the software is tuned to provide that one function very effectively.
This split microkernel architecture has been in use for a long time on big mpp systems like the paragon and the t3e. The software base (catamount/linux) is new, but the design is old.
catamount is the kernel that runs on the compute nodes. IT's a tiny kernel that packages up the OS service requests, and sends them, over the interconnect, to an OS or I/O node, which does the real work of the operating system. catamount is a descendant of PUMA, which came from Cougar. These are heavily derived from work done at caltech. (I believe CMU, and one of the UTexas schools also played a role, but am not sure). The idea is that the microkernel is small and unobtrusive, and it gets the hell out of the way so the application can use the CPU as much as is possible.
The OS and I/O nodes run linux, and provide services to the compute nodes. This is probably, but it could just as easily be running as a user-space daemon on the OS node. (Though you might have to do some mem-copys that way, which would lower performance)
NOTE: Though these nodes take advantage of some of linux's features (like the lustre file system) they do NOT necessarily implement these features for the system as a whole. They probably provide a minimal set of features necessary for the sorts of problems that the xt3 runs. All the scheduling work that has gone into more recent linux kernels is of little use, as the compute nodes have their own scheduler, probably more closely tied to the batch dispatcher than to the linux kernel. To say that the system runs linux is true, but a little misleading. It's a very different linux than what runs on my desktop, and it's used in a very different way.
Re:Just the name brings back memories
on
Cray XT-3 Ships
·
· Score: 4, Insightful
Actually, there is no reason to cluster a few of these. If you have a 2000 node xt3 (or t3e, paragon, blue-gene, cm5, insert mesh-structured mpp here) and a 4000 node xt3, you stick them together and make a 6000 node xt3. But that's just picking nits.
Curiously the xt3 IS about shaving dollars off the price. If you go read the origional whitepapers on the system, they go through EXTENSIVE cost-return analysis. They studied their (then-) current generation of cluster systems, as well as future linux/solaris/aix clusters, and rejected them as (interestingly) FAR TOO EXPENSIVE, once the administrative costs are factored in. They then looked at, and rejected, cray's vector solution, the X1. They then decided that the (amazingly) most cost effective solution was to underwrite cray's product development cycle on a wholey new product. Basically they asked for an update to the system they already had. (asci-red i.e. intel paragon++) Nobody was building such a thing. Since cray had a really strong similar product in the 90s. (T3D, T3E) the department of energy asked them to create an update. Some designs never die.
What I'm most interested in is the reliability. One of the biggest difficulties in the T3D engineering cycle was dealing with memory failure. red-storm is going to have 10,000 processors. Lets assume each has 2 banks time 3 dimms (chip-kill) of memory. That means there are 10,000 x 6 x 18 = 1 million+ memory chips in the system. IF 1/100th or a percent of these fail, that's still a lot of memory failures. How well are faults isolated? That's the big question for systems this big.
I'm also a little wary of cray's use of lustre. I've used lustre before, as well as other cluster-FSes. While I'm not aware of other filesystems that will scale to 700+ i/o nodes, I'm not confident in lustre. It's an immature product at best. (I don't mean to disparage the people working on it, it's a neat architecture, but it's a hard problem, and I'm not sure it's ready for prime-time.)
Re:... Back in my day .... young whippersnapper
on
Cray XT-3 Ships
·
· Score: 1
well, when the computer costs 10-15 million dollars, you can afford to spend twenty thousand on making it look really cool.
compared to ASCI-red, the system that red-storm is replacing, xt3 looks increadible. Yes it's a long row of rectangular racks, but at least they are stylish racks. Intel built asic-red in beige box style. Oh well. function over form I suppose.
It is a radeon/geforce competitor. Or something like that.
The cell processor is only really fast when the spus are in use, which means 32-bit non-branching floating-point arithmatic. For anything involving integer math, flow control, or uneven memory access, the SPUs defer to the main processor. I'm sure IBM put a decent processor in there, but it doesn't sound like it's anything revolutionary, and there's only the one.
What does this get you? -- A processor that is really good at decoding mpeg, rendering graphics, maybe approximating the physics of flying dragons. It is not a fast general purpose processor. Operating systems, word processors, databases, these are all integer tasks, and much more-so they are branch tasks. Scientific computation - this requires double-precision floating point. Photoshop is about the only piece of non-multimedia software that might be able to take advantage of this.
The end result is that this will likely be a great chip for set-top boxes of all sorts, maybe even for video-editing workstations. A G5/pentium replacement it isn't; that's a different ball game.
No it's not at all revolutionary, but you stumbled upon one of the rare bits of insight in the world of computers: NOTHING NEW EVER EVER EVER HAPPENS IN COMPUTERS!
We figured out the basics for how to build a computer by about 1970, and since then there have really only been about three discoveries per decade. Most of those aren't even technology discoveries, but economic or marketing break-throughs. Most of computer engineering is taking old ideas, and incrementing them, or packaging them together in different ratios, or in new combinations.
Is the cell architecture new? No! It's 8 DSP's chained to a special I/O port on a powerpc processor. Each has a dedicated block of sram, and can mark dirty pages in the powerpc's main memory. Actually they probably can't, I bet the master processor does that. Will this thing be really fast for decoding mpeg streams, yeah probably it will. Will it be fast for general purpose supercomputing? - Absolutely not - supercomputing is memory limited, not ALU limited. Will it revolutionize servers or desktops? - absolutely not, they are limited by branches, and unstructured memory access, not by computation ability.
The cell is a really powerful brute-force solution to a problem that not many people have. It's probably a great fit for ps3, and not for a lot else. If it were, you'd see IBM promoting it in other product lines. They aren't, so it's obviously not that usefull outside of embeded dsp-style applications.
This is all extremely plausible. remember: these are not 8 full function ALUs like you get in a modern microprocessor. They opperate on 32-bit single-precision floats only. For all we know they are each in-order long pipelines, with static branch prediction, no speculative execution, a limited distributed memory scheme etc.etc.etc.
It's really 8 digital signal processors attached to a powerpc chip. You can do that today, you just run the data over the PCI bus, rather than on a big in-chip bus. IBM is just gonna stuff all that onto a chip, and hope that 65nm process will make if fast/cool/cheap enough to put in a console.
It has always been possible to make very fast parallel architectures if you make them very special-purpose. Cell sounds like it will be very difficult to program, and only a couple of game-engine designers will bother trying. The rest will just license those engines and work on game-play, story-line, and art. Same as before, only faster.
I edit digital photos, and like to keep multiple versions along the editing process. I have about 300GB in my working pile, and 2TB in my archive. I use a 100GB LTO tape drive, and it takes 20 tapes to get a single backup. Since I want to keep one set of tapes at my house, one set at the bank, and one set in transit, that's 60 cartridges. I would be very glad to have at least a 1TB tape solution for under 5 grand.
I can't begin to imagine what the high-def video people do. Yes you can store a lot of 320x280 mpeg files on a cheap disk. Not so with HD stuff. it's like 200GB per HOUR of footage.
computers are becoming consumer electronics devices. They're being sold by the same people and to the same people shopping for TVs and microwaves. Yes it's more fun to watch your crappy football team on a 42 inch plasma panel, but the bulk of TVs sold are still 24 inch crts for under $300. The bulk of microwaves cost under $50, and the most common stereos are still bookshelf/boombox systems that cost under $100.
Technodorks are the minority. $500 is STILL far too expensive for a lot of people. For a lot of people the choice is often between buying a computer, or eating. Thus we have a digital divide where the poor in this country can't work a computer, and thus have fewer chances to get decent jobs, thus their children also won't have access to technology, thus...
Even for those with the means, it's a comparison of value. What do you get out of your two thousand dollar laptop? How does it improve your life? Is that worth more than flying your whole family to visit your parents for christmas? Is that worth more than a whole new wardrobe and a lot of dinners and movie tickets? I'm reluctant to endorse anything walmart does considering how they treat their employees and what they do to small communities, however, I'm glad that this increases technology access to people who might not be otherwise able to afford it.
Oops. -DOESN'T solve the hard problems, just puts them off-
Strained silicon is a great technology. you get 30% (or whatever) better electron mobility, which makes for faster capaciter discharge, and thus faster transister switching, and reduced heat generated in the process. However, you can't strain it much more than they already have. It bought the lithography folks another few hundred megahertz, but it's not going to keep moore's law alive for another couple decades, at least not by itself.
Strained silicon doesn't really address the two big problems facing silicon lithography: leakage current, and the ever rising costs of dynamic power costs. Even with strained silicon there are still hundreds of millions of capacitors, each charging and dischanrging billions of times a second. If the frequency increases by some number X and the number of caps increases by some number Y, you have to drop the charge on each cap by X*Y or the dynamic power usage goes up. Furthermore, leakage current, which used to contribute almost nothing to the energy needs of a CPU, now makes up a good percent of the electrical and heat budgets. The drains are just too close to the body. There are too few atoms of semiconductor to act as a resistor.
It's a nice one-time speed bump, but it does solve the hard problems, just puts them off for another year.
No. They saw it coming. There just wasn't much of anything to do about it. Even IBM and HP are unprofitable because of dell and sony.
How is SGI going to compete on their volume? not at all. Workstations just aren't a big enough market. They tried, remember the visual workstations (SGI, xeons + WinNT) I don't think there's anything they could have done. Lost cause.
No. He's right about the disposable income thing.
What's the #1 most sought after market in america as well as Japan? Teenagers. They don't make the most money, but it's all disposable. A few product segments sell exclusively to adults: cars, mortgages, dishwashers. However, the lifestyle products sell to kids who either spend their own money on the item, or talk their parents into buying it. Market research shows that the average teenager controls or directs almost $300 per week of spending. (lies, damn lies, and statistics)
In Japan they just have a lot more 33 year old teenagers running around.
Absolutely. Just like flat panels first took off in wall street brokerage firms. People in New York are willing to spend more on things that are small becuase everyone has a 500ft2 apartment. In japan they have 400ft2 so they'll spend even more on small crap.
For the rest of our fat-ass country we don't care about small tech toys. We already have 4000 ft2 houses to store all of our ass-fat. Besides, we're busy spending all our money on fatass cars to drag our ass-fat to the aktins approved fatorium to optimize our fatilization.
The US is about big, not about sophisticated. We have the biggest cars on the longest highways, with the biggest heated seats. 20ounce steaks, 2 for $8 if you catch the earlybird with a side of coleslaw and a porkchop.
Tru64 definately is a better Unix. However, it has never been very successful in the commercial marketplace. Switching over to a Tru64 system would completely alienate the hp-ux customers, and there are MORE hp-ux customers.
HP-UX is an old relic, (seriously. working in the HP kernel is like looking at ATT unix from the mid 80's), but it works. It has the virtualization features one might expect from a high-end unix, and a lot of software support. It doesn't preform particularly quickly, and it's kinda obscure and clunky. What it really lacks is a mature 3rd generation filesystem, which is why it comes bundled with Vxfs.
Dec's AdvFS is not really any better than Veritas, except that it's so nicely integrated with Truclusters. I don't know how well Veritas' clustered filesystem works, but it runs on solaris and linux. Thus you can run both linux and hp-ux on vpars within the same hp server, and share data. Though I really liked trucluster/cfs, it would only be really helpful if they ported to both linux and hp-ux.
Appart from making the Tru64 -> HP-UX transition harder, I don't see that they lose any features by picking veritas over CFS. It just seems like hiring a few more engineers would have been cheaper than playing this back-and-forth game with marketing.
last time I compiled FreeBSD all the way up, it was on a 200mhz PPro. Yes it takes a while, but on the order of hit-the-button-the go out the the bar, not hit the button-wait three paychecks.
I have to imagine the integer units on an ipaq are at least that of my old ppro, as must be the memory. The drive is probably slower though.
Considering how tech savy the author seems to be, it's interesting that he doesn't understand what an MPP is. The T3d IS an mpp, made in response to a wave of mpp designs in the late 80's taking some of cray's market share (thinking machines, paragon, etc) MPP, incidently, stands for Massively Parallel Processing; massively as in hundreds, not 32.
The T90, on the other hand, is a pure SMP. The processors all sit on a shared bus (actually 256 parallel shared buses). Each CPU was really fast (for the time) and had really big pipes to memory, and really expensive.
Sorry, just picking nits.
The sad thing about Nuclear (or wind, solar, etc) alternatives is that they only can work as alternative to electrical grid power. Much more than half of the oil used in this country powers car, trains, trucks, and planes. Another big chunk goes to providing actual heat for buildings and industrial processes.
Since the efficiency of electrical power generation is less than 50%, heat generation will almost certaintly never use electrical power. If we can get fuel cell systems to work well with hydrogen, and can efficiently use electricity to isolate hydrogen from water, then MAYBE cars and trucks could benefit from this. Jets will almost certaintly remain oil-fueled.
Thus you might be able to impact 40%-50% of the energy needs of the nation. Though obviously it doesn't really matter where you save the oil, just so long as the total consumption goes down.
The big spoilers in all this are India and China, whose energy needs are going to increase constantly over the next several decades. It doesn't matter who is the president, we're gonna have to compete for the oil, whoever controls it. Gas prices are only going to go UP.
Linux has been EXTREEMLY successful at breaking into the supercomputer market. Particularly in the 40-200 dual SMP node size. Some really large systems run linux, but not as large a majority as is the case with the smaller systems.
Probably the two most successful supercomputing product lines are the IBM p-series clusters and HP's alpha-server SC line. Sun sells clusters of sunfire 6800's and up.
The IBM uses 1-32 way SMPs running on POWER[3-5] processors and running AIX. They use a proprietary interconnect switch called the SP switch, the most recent incarnation of which is called the Federation switch. The SP switch is faster than the commodity cluster interconnects, but not when you divide it by the number of CPUs in the big SMP boxes. One of the main strengths of the IBM systems is the absolutely HUGE I/O bandwidth of the large systems. I've seen real-world: 15 gigaBYTES per second to a filesystem. The product line has been around for a decade, and is well understood, and a good performer.
The HP (formerly compaq, formerly DEC) alpha server uses smaller SMPs for the SC line. Each node is a 4-way SMP running Tru64 (i.e. OSF/1) Unix with quadrix as the interconnect. These systems used to totally trounce any other cluster product, but are getting long in the tooth. HP stopped development of new systems about 4 years ago, and recently released the last CPU rev for these systems. They are a little reluctant, however, to publish performance number, as the 1.3ghz alpha processors still get performance about the same as HP's newer Itanium2 based systems. Tru64 Unix, while never commercially successful performs very well, and is quite suited to this application. The TruClusters software and the integrated cluster filesystem make systems administration on these big systems much simpler than linux clusters. Currently the #3 fastest super in the world is an SC. Since this is the end of the product line, they will all probably disappear.
HP also sells some of their HP-UX based systems, which now use itanium2 processors, to supercomputing customers. Though they make 128-way SMPs, they sell those as super-servers, not for hptc customers. Instead they stick with the 2 and 4 way nodes with infinaband or myrinet. Most of these are sold with linux, but some use hp-ux. I can't see any real advantage to using hp-ux in that sort of environment, and apparently no one else can see one either.
Sun sells clusters of their big SMP boxes into the hptc market. These clusters run solaris and use Sun's proprietary sunfire-cluster interconnect. The interconnect is REALLY fast, but maxes out at 4 nodes. Thus you can string together 4 72-way (actually 144-core) SMPs for a 288cpu (576) cluster. These aren't the fastest CPUs on the planet, but aren't half bad. The interconnect is good, and the software availability is really good. I think most of these aren't sold as PURE compute clusters. Rather they are likely some hybrid of super-compute and server-type operations.
Fujitsu sells linux/itanium boxes, but also has a primepower cluster which uses their solaris based sparc64 SMPs clustered using commodity interconnects. As with the sun solution, they provide a host of software options, decent CPUs, and a fairly hefty price tag when compared with commodity clusters.
SGI also still sells their legacy origin (MIPS cpu / Irix os) systems. Since they max out at 1024 CPUs per system image, the really large systems are actually clusters of origins. Irix scales really well to a lot of CPUs, but each of the processors is just not very fast. The system is interconnected with craylink.
That really describes the bulk of the cluster products. As for non-cluster supercomputers ----- Asci-red runs a proprietary microkernel OS called PUMA. Cray XT3 runs catamount. Cray T3e runs unicos/mk (ChorusOS). Cray X1 runs unicos/MP (irix). NEC SX-series run a system-5 derivative unix called Super-UX. Hitachi SR8000 uses HI-MX (Mach).
Dig around on top500.org and you can probably find out more.
At NASA sgi has been experimenting with 2048 proc single system image. Since the japan system has yet to be deployed, it will likely be a single system.
The SGI magic memory controller incorperates the numalink (origionally called cray-link) router they leveraged from the T3e work. This router uses worm-hole routing, which starts forewarding a packet as soon as the address bytes are read. This means that the added latency of going through several routers is often much less than packaging up the packet in the first place. On the hardware side of things it's not the number of router-hops that limits the scalability of the system. Rather the greater the size of the memory, the coarser the size of the directory blocks. With 13TB of memory you are probably invalidating dozens or hundreds of pages at a time. SUCK.
The cache coherency of SGI's cc-numa machines makes them increadibly easy to program. However, there is a big overhead. Since most supercomputing software is written with MPI, rather than with posix-threads, you don't really behefit from it anyway. I think you can disable the hardware coherency on a per-process basis, which would greatly speed up MPI software.
Commodity linux clusters are not the only kind of cluster out there. SGI has been building clusters since the late 80s. Their first super-computer product, the power-challenge clusters, were 16 and 36 way SMP boxes clustered together with hippi. Remember terminator-2 and jurasic park? Those were rendered on clusters of crimsons and indigo workstations. They may have called the NOW(network of workstations) instead of beowulf, but it was the same thing.
As for linux, they stepped towards linux about the same time IBM, HP, and Oracle did. They've contributed a LOT of code to linux and GPL products. They have transitioned the bulk of their product-line to linux in the last year or so, but they started that process five years ago. They have a LOT of legacy customers and legacy code to transition. Linux is a stable and high performance OS, and it would be that way without SGI, but it got there a lot faster because of SGI's efforts.
Furthermore, SGI doesn't give a damn (nor does anyone else) if slashdot loves them or not. They care if nasa, boeing, the US navy, BP, and NBC love them. These are the people with the bucks, more interested in a solution to a problem than to any license or technology.
The real reason that SGI doesn't get the credit they should is much simpler: They put a crappy scsi controller on the mezanine-bus of the challenge-S server in 1994. In the early 90s SGI was the darling of the multi-media world. Their workstations were everywhere, and they made pretty cool servers too. They were poised to ride the same dot-com wave that SUN rode. They introduced a single-CPU server called the challenge-S, which was derived from the indy workstation. It was reasonably speedy and quite affordable (for a unix server of the time). The scsi controller, however, was quite prone to failure. They developed a bad reputation. While the world was busy buying sun servers hand-over-fist, they avoided SGIs except in the technical/defense/media markets. That legacy shaped the company into what it is today: a niche player, struggling against giants like IBM and HP, in the relatively small market for high performance computers.
SGI did the design work, but they wouldn't be there now without nasa in the past. For the last 10 years, at least, Nasa ames has had a dozen engineers who might as well be called sgi employees.
Nasa has been a strong supporter of SGI for a decade or more. They were the biggest customer of the challege-XL power-clusters back in the early 90s. They installed the first 128proc O2000s. They installed one of the first 256 proc O3000s. The NAS group there was instrumental in getting cxfs and dmf product-ready. They've bank-rolled a lot of sgi development, and been a test-bed for sgi's top-end, bleeding-edge systems many times.
I don't mean to disparage the altix, it's a great platform. Nasa, however, does not just buy shrink-wrapped purple boxes. They raise the bar, and vendors try to clear it.
you're right that most estimates ignore labor, and discounts, and are completely unrealistic. Virginia Tech has the advantage that the Big Mac is just a technology demonstation, and doesn't actually do any real work.
Xsan, is apple's san filesystem. $1000 per node + $500 for a dual port FC card. A bargain by anyone's standard. Most big clusters, however, don't use cluster filesystems. I don't understand why not, it seems like a great idea. But they don't.
Octiga bay is certaintly an odd buy, but it doesn't seem to get in the way of the big guns. It really is more of a linux networx sort of product, with a similar level of sophistication. They can keep the division largely seperate and be happy, I would think. The cray name adds credibility to the product, and it addresses a market segment that isn't served by the current cray offerings. Obviously the institutional investors agreed with you, and dropped cray stock 20% after the acquisition. I definately agree that it's a bit of a black sheep, but maybe that's okay.
I think the X1 is an amazing machine. However I don't think it's true that X1 is only competing with NEC. Most big codes have been ported from vector archs to MPP or cluster code. If you read the comparisons Oak Ridge did, they are comparing X1 to altix and IBM p-series. The X1 does very well compared to scalar systems, but they also have to compete on cost. They need to keep up a critical mass of sales to pay for the custom engineering.
They also need to do something about the terrible, terrible scalar performance of the MIPS-derived X1 cpus. While it would be cool to have a mix-match system with some vector boards and some scalar boards (presumably the vector boards have multiple parallel interconnect networks), I'd rather see an X1 with an opteron as the scalar processor. (Not that this seems reasonable, as the scalars in the X1 can access the vector registers in 3 cycles, or something like that. totally impossible even on hypertransport)
Pre-SGI cray needed help, badly. They just had too many product lines for a post-cold-war supercomputing company.
They had the C90-to-T90 transition, which they were doing almost completely by themselves. Even IBM had given up on ECL. T90 was fast, but it was really really expensive. 32 processors sharing a flat memory is just too daunting, even without caches.
The J90 was profitable in the end, but took a lot of capital during the design work. Furthermore it detracted from the T90's transition to IEEE fp.
The T3D was a really cool product, but the engineering costs were huge.
The 6400 was a really great product, FOR SUN microsystems. Why did cray think they could sell a database server better than IBM or sun could?
It's not that they didn't have a pot to piss in, it's that they tried to piss in 25 pots all at once. Lo and behold, they got their feet wet. Shock and amazement.
It's too bad that sgi didn't use the cray division very well. They had a really good opportunity to blend some amazing technologies. Cray had the old, wool-suit engineers who had tackled hard problems for a long-long time. SGI had the wild breed of california technologist from the early RISC days. The sum of the parts was huge. Alas the combined whole was a disaster.
Cray today looks like they are doing cool things. xt3 is pretty neat. I'm curious to see how they combine it with the x1 vector stuff. Hopefully their three current product lines can share a lot of technology in future revisions. They just need to avoid getting too big for their britches. The supercomputing market is just not big enough to support anything too massive. Especially when you have to go head-to-head with IBM.
It's not a cluster.....
well, sort of.
There are thousands of compute nodes, all of which get i/o services from dozens, or hundreds of i/o nodes. These i/o nodes run linux, several instances of linux. Basically the i/o nodes ARE a cluster, though not a compute cluster, and not necessarily a symmetric cluster. The i/o nodes run lustre in very much the same way that a cluster system would (though they can take advantage of hardware features not present on commodity clusters).
The real difference in this system is that nodes are not peers, as they are in commodity clustes. Each node has only one function, and the software is tuned to provide that one function very effectively.
This split microkernel architecture has been in use for a long time on big mpp systems like the paragon and the t3e. The software base (catamount/linux) is new, but the design is old.
catamount is the kernel that runs on the compute nodes. IT's a tiny kernel that packages up the OS service requests, and sends them, over the interconnect, to an OS or I/O node, which does the real work of the operating system. catamount is a descendant of PUMA, which came from Cougar. These are heavily derived from work done at caltech. (I believe CMU, and one of the UTexas schools also played a role, but am not sure). The idea is that the microkernel is small and unobtrusive, and it gets the hell out of the way so the application can use the CPU as much as is possible.
The OS and I/O nodes run linux, and provide services to the compute nodes. This is probably, but it could just as easily be running as a user-space daemon on the OS node. (Though you might have to do some mem-copys that way, which would lower performance)
NOTE: Though these nodes take advantage of some of linux's features (like the lustre file system) they do NOT necessarily implement these features for the system as a whole. They probably provide a minimal set of features necessary for the sorts of problems that the xt3 runs. All the scheduling work that has gone into more recent linux kernels is of little use, as the compute nodes have their own scheduler, probably more closely tied to the batch dispatcher than to the linux kernel. To say that the system runs linux is true, but a little misleading. It's a very different linux than what runs on my desktop, and it's used in a very different way.
Actually, there is no reason to cluster a few of these. If you have a 2000 node xt3 (or t3e, paragon, blue-gene, cm5, insert mesh-structured mpp here) and a 4000 node xt3, you stick them together and make a 6000 node xt3. But that's just picking nits.
Curiously the xt3 IS about shaving dollars off the price. If you go read the origional whitepapers on the system, they go through EXTENSIVE cost-return analysis. They studied their (then-) current generation of cluster systems, as well as future linux/solaris/aix clusters, and rejected them as (interestingly) FAR TOO EXPENSIVE, once the administrative costs are factored in. They then looked at, and rejected, cray's vector solution, the X1. They then decided that the (amazingly) most cost effective solution was to underwrite cray's product development cycle on a wholey new product. Basically they asked for an update to the system they already had. (asci-red i.e. intel paragon++) Nobody was building such a thing. Since cray had a really strong similar product in the 90s. (T3D, T3E) the department of energy asked them to create an update. Some designs never die.
What I'm most interested in is the reliability. One of the biggest difficulties in the T3D engineering cycle was dealing with memory failure. red-storm is going to have 10,000 processors. Lets assume each has 2 banks time 3 dimms (chip-kill) of memory. That means there are 10,000 x 6 x 18 = 1 million+ memory chips in the system. IF 1/100th or a percent of these fail, that's still a lot of memory failures. How well are faults isolated? That's the big question for systems this big.
I'm also a little wary of cray's use of lustre. I've used lustre before, as well as other cluster-FSes. While I'm not aware of other filesystems that will scale to 700+ i/o nodes, I'm not confident in lustre. It's an immature product at best. (I don't mean to disparage the people working on it, it's a neat architecture, but it's a hard problem, and I'm not sure it's ready for prime-time.)
well, when the computer costs 10-15 million dollars, you can afford to spend twenty thousand on making it look really cool.
compared to ASCI-red, the system that red-storm is replacing, xt3 looks increadible. Yes it's a long row of rectangular racks, but at least they are stylish racks. Intel built asic-red in beige box style. Oh well. function over form I suppose.