flaming-opus · Slashdot Mirror

Re:WOW! But is it ready for the enterprise? on 3 Terabytes, 80 Watts · 2006-08-29 07:01 · Score: 2, Informative

No it's not even close to enterprise ready! A basic dual-powersupply server with a hardware raid card and a raid5 of sata drives isn't really enterprise ready. Enterprise means no single point of failure. Redundant raid controllers, power supplies, storage networks, mirrored caches, remote administration and performance monitoring, remote snapshots or archiving. Enterprise is expensive, but for good reason.

As for your question, enterprise IDE can only be realistically used for back-up or archiving purposes where the drives are used intermittenly. Several drive makers have sata disks with fibre channel interfaces on them, termed FATA drives. IF you put a bunch of FATA drives under high load 24-hours a day, after about a week you'll start to see 1% of the drives fail EVERY DAY. I'm not joking. I had to deal with a cluster of FATA raids used for high-def video workloads, which was loosing 4-5 drives every day, out of 550 installed. We eventually junked the entire setup and installed 1300 real FC drives instead. Even those die at more than 1 per week. IDE drives work fine in your desktop because you are only loading them up 10 minutes at a time, a couple dozen times a day.

It doesn't look like Capricorn is possitioning this as an enterprise solution anyway. It looks like a workgroup NAS sort of thing, or a proxy cache of some sort. I'd file it in the "not mission-critical" folder.

Re:White collar welfare. on Cray Wins $52 Million Supercomputer Contract · 2006-08-11 04:37 · Score: 1

Well, supercomputers have become a dime a dozen. Or rather, clusters have become a dime a dozen. However, a lot of the really demanding tasks that high-end supercomputer users need to do, are not terribly well served by clusters. Nersc has cluster systems, they know how to use them, and what they are good for. The fact that they are buying a cray indicates that their needs were not well met by the clusters.

Furthermore, programming a modern cray is not very much like programming a YMP. The code is structured very much the same on a XT3 as it is on Blue Gene, or on a cluster. There really is a lot of interoperability in the HPC space.

The real flaw in your logic is that there will be a lot of competition. The supercomputing marketplace is really tiny. It's about a 4billion dollar worldwide industry. That sounds like a lot, but it's really tiny compared to the greater computer hardware industry. Why compete for supercomputer dollars, when there are so many corporate customers with more money, and simpler demands. IBM and HP already own most of the HPC market by selling clusters of their business-class servers, so there's really only a tiny slice left over for the real innovators. If your particular need is not met by a cluster of IBM unix servers, you are in the tricky situation of forking over a bundle for a cray/nec/sgi box.

Niche markets have always been expensive. JP-7 fuel costs $30/liter. Modern day fighter jets cost $100million each. The government buys expensive stuff. Not really news.

Re:Who else bid? on Cray Wins $52 Million Supercomputer Contract · 2006-08-11 04:06 · Score: 1

Well, the Thinking Machines story is a little more complicated than that.
http://www.inc.com/magazine/19950915/2622.html

Basically, IBM and Cray got caught by surprise when the MPPs, of which TMC was just one, came onto the market. Eventually they got their act together and put out the SP and the T3D, which were both good products. Thinking machines got hit by the same post-cold-war lull in supercomputer buying that hit everyone else, and they just weren't big enough to ride it out. Even at their peak, they were a $100million/year (inflation adjusted) company. The corporate landscape is littered with the corpses of supercomputing companies that rose and fell, particularly those that rose in the late 80's, and disappeared in the 90's.

Yes we would on Cray Wins $52 Million Supercomputer Contract · 2006-08-11 03:55 · Score: 1

We wouldn't know every machine Cray shipped to NSA, and the exact specs, but we would generally know. Indirect evidence abounds.

In fact, it was widely reported in the HPTC press that the defense department was helping to fund the development of the "Black Widow" vector supercomputer, and that they funded much of the development costs for the X1. If you search the web for a while, you'll find that the NSA is the only customer for cray's bizzare MTA3 supercomputer. Every once in a while you'll see cray press releases about sales of X1's to "undisclosed" or "government" customers.

Without Uncle sam's checkbook, cray would certaintly go under. I bet the US government accounts for 1/3 of their sales, and another big chunk from foreign governments, and military contractors like boeing. That said, it may not be a bad investment on the government's part. Keeping Cray alive means that IBM has some competition, and it keeps the innovation going at both companies, and keeps down the sticker price on the high-end IBM and HP systems. Subsidizing cray is not cheap, but it may be cheaper than not subsidizing cray.

XT3 not really a cluster on Cray Wins $52 Million Supercomputer Contract · 2006-08-11 03:38 · Score: 1

The XT3 is not really a cluster. True, it's a message passing machine with an interconnect between commodity processors, but that interconnect is very highly integrated into the system design, and the software stack is very customized. The line around what is, and is not, a cluster is a fuzzy line, but this is not a cluster in the language of government supercomputer procurement where 'cluster' means 'commodity cluster'.

Nersc, and a number of other dod/doe labs are buying a new generation of "true" supercomputers (XT3, bluegene, P570/VIVA, altix, X1) as a response to the perception that commodity clusters have largely failed in delivering real performance. While performance on real (not-linpack) applications has been a problem for commodity clusters, the real problem has been reliability. A machine that gives you 10 teraflops of performance, but is down for maintenence half the time, is less useful than a much smaller machine that actually runs. While the fundamental architecture of a commodity cluster does not preclude reliable real-world performance, many of the clusters actually in use, have not provided very compelling real performance for the real world cost of owning the systems.

Nersc has owned commodity clusters, as well as large mpp machines from IBM and Cray, including the XT3's predecessor the T3E. They know what they're getting into.

why just one next gen on Casual Gaming the Real Next Gen? · 2006-07-06 08:44 · Score: 2, Insightful

I think there's market room for both. The hardcore can dump a grand on a ps3 and a pile of 3d shooters, and there will be several million of these folks.

There is also an addressable market of several tens of million people interested in spending a couple hundred dollars a year and a couple hours a week on video games.

It's like any other recreation market. There are cyclists who will drop five grand on a carbon-fiber frame, and those of us who like to take a ride around the lake on our three hundred dollar mountain bikes. There is a market for motorists driving quarter million dollar lotus roadsters, but mazda sells a higher total revenue worth of miattas. so on and so forth.

Re:Can't they use water cooling on New Top500 List Released at Supercomputing '06 · 2006-06-29 05:48 · Score: 1

Actually the sgi columbia system at nasa does use liquid cooling, as do a number of the HP clusters. They do not use liquid cooling directly to the surface of the CPU, but they do blow the system exhaust through a radiator filled with 45 degree liquid. APC also sells a similar setup for datacenters.

What happens in really dense data centers, is that you alternate hot isles and cold isles. You face all the racks so that the front of the machines are pointed at cold isles, and the backs face hot isles. Thus no machine is breathing in the hot exhaust from the next isle over. IF you pack too many machines into a machine room, the HVAC system can't injest the hot air fast enough, and you start feeding hot air into the intake for your machines. Thus you throw a relatively simple chilled-water radiator on the back of the rack, and the exhaust air isn't very hot. Hopefully you have a drip tray and a drain on each one. The setup will probably still work if one of the racks has a failed air chiller, so long as most are working, and the hot-isle doesn't get too hot.

Re:how many aren't listed? on New Top500 List Released at Supercomputing '06 · 2006-06-29 05:26 · Score: 1

I suspect the NSA has some FPGA-derived custom processors for this purpose. Probably rewritten for each algorithm, and probably built in-house. Attach these to the I/O bus on a big sun or IBM server which acts as the I/O engine. Pure speculation of course.

Re:What, no microsoft? on New Top500 List Released at Supercomputing '06 · 2006-06-28 07:54 · Score: 2, Informative

shared memory bus is a pretty subjective measure. Sine the mid-90's no supercomputers have used true flat-memory shared buses. Instead they are connected by some sort of switched point-to-point network. In many of the mpp machines, these networks are built into the memory controllers on each node. (blue gene, crays, earth simulator, etc) The bandwidth and latency of these networks is orders of magnitude better than gigE, but it's still a network of sorts.

Re:how many aren't listed? on New Top500 List Released at Supercomputing '06 · 2006-06-28 07:01 · Score: 4, Interesting

I would say it's unlikely that the classified computers are in the top 10, and here's why: the top500 list is constructed using linpack to measure floating point performance of highly parallel computation. Much of the work done by intelligence agencies is data-mining. It's integer tasks that are probably I/O bound rather than cpu-bound. If there were a top500 list of high performance storage systems, I bet the classified systems would own the top of the list, just not for raw fp-compute power.

Take, as an example, the Cray MTA. It's a product that's not even mentioned in their products page on their website. Yet, if you surf the net very carefully, you'll find out they're building the next version for their single customer of the product-line: the NSA. Even at the maximum configuration, the machine wouldn't make the top500 list, but it has features that make it uniquely suited to a few very peculiar application kernels. (single-virtual-cycle access to any memory within the distributed system)

Sure the department of defence uses supercomputers to predict the weather, improve weapons systems and simulate, but these are probably not done on systems we don't know exist. That sort of stuff is done at AHPCRC or ERDC, or at Beoing/lochead-martin/Ratheon/etc. All of these sites have huge HPC resources, just not the hugest of the huge.

Re:The death of SGI on SGI Files Chapter 11 Bankruptcy · 2006-05-08 03:39 · Score: 1

Of course most people, when writing about the demise of cray, point to the ponytailed hippies from SGI who spent money like it grew on trees, and hadn't a clue how to sell into a competitive market, make compromises, or actually build things that the customers actually wanted.

SGI also really shot itself in the foot in the internet server market. They released really great hardware for the task, particulary the challenge-S, which was fast and affordable. However, they had real reliability problems and supply chain problems. (I've never seen so many computers catch on fire, as the octane) They managed to cause a lot of problems for their users at the beginning of the internet boom, and lost themselves a lot of mindshare in the market. Sun managed to build up enough mind-share that they are still relevant, despite the end of the .com boom.

A company that was universally focused on cool technology, but without much of a care for what could actually be sold into the market, and what it would cost to do so.

Re:So the CPU will still be waiting for RAM? on HyperTransport 3.0 Ratified · 2006-04-25 05:33 · Score: 1

You do raise an interesting point about the memory expansion. Opterons are limited in the amount of local memory they can address. Each opteron includes 2 memory controllers. DDR memory controllers can't drive more than 3 modules per bus, limiting an opteron to 6 dimms per cpu. Currently that puts a ceiling of 12GB/cpu of memory until 4GB dimms become available in quantity. For most people, this is not a real hinderance; 24GB for a dual-proc sled is plenty enough. There are some cases, however, when your real problem is the size of addressable memory. I've wondered if someone out there might not put together some motherboards with some memory-extender controllers that would plug into hypertransport, and add additional remote memory capacity to the cpu's on the board. I suppose the people who need more than 12GB/CPU are pretty few, and you just tell them to buy more processors, even if you're not cpu bound.

SGI does something like this on the altix, which allows you to use memory-only nodes on the ccNuma fabric.

What you are describing is cache on HyperTransport 3.0 Ratified · 2006-04-25 03:29 · Score: 1

That's exactly what a cache is. It's very high speed memory, often SRAM, attached on a very wide bus. Instead of letting the programmer or the OS decide which parts of software to put in the high-speed ram, and what to leave in the low-speed ram, the cache controller does, essentially letting all the data have a place in the high-speed ram, but occasionally replacing it.

What you describe doesn't really solve any real problems. Graphics cards bennefit from fancy memories like gddr3 because they are bandwidth starved. We can see from the lack of performance increase in DDR2 systems, that opterons and pentiums are not bandwidth starved when it comes to memory, they are latency bound. Super-fast memory designs like XDR don't really help the latency problem, they only increase bandwidth, which is why you see them used for bandwidth starved micros like cell or the cray vector systems. SRAM would help the latency issue, but its so expensive, you can't throw a quarter gig in a system, even a couple megs is really expensive, so it's better to use that as cache, rather than direct-address memory.

Furthermore, you're not saving all that much money. Some of the cost of expensive graphics ram is the memory, but a lot of it is also the elaborate memory controller, and all of the bus pins. What you're proposing still requires expensive CPU designs, CPU sockets, many-layer motherboard layouts, and memory package designs. You still have most of the cost of high-end server designs, but only a fraction of your memory is fast. It doesn't seem worth it.

Re:Might we ever have socketed Hypertransport GPU' on Start-up Could Kick Opteron into Overdrive · 2006-04-24 09:16 · Score: 1

The advantage of pcie is that the specification will likely stick around for 6-7 more years, and will be used on both intel and AMD systems, as well as some of the more obscure architectures. You can design your graphics card and know that just about anyone will be able to throw is into their box. It gives you a very large addressable market. putting the GPU into a processor socket cuts your addressable market down to only AMD boards.

Furthermore, GPU's do require a fair amount of bandwidth, but are a lot more latency tolerant than a processor. PCIe is really suited to the job, I don't see why you'd want to supplant a slot designed for I/O boards with a socket designed for something else.

Re:reinventing the wheel... and making it a square on Mysterious 'Forcefield' Tested on US Tanks · 2006-04-12 07:30 · Score: 3, Insightful

Uhh, excuse me.
Which part of "keep the guys in the tank from dying" don't you like? The US uses 70 ton tanks, the most sophisticated in the world, and they can be pretty well blown up by a guy with a 50 pound rocket on his shoulder. There are quite a few companies in the US, and in russia, who will sell you rockets with multiple shaped charges, that will pretty easily defeat reactive armor.

The real trick to a system like this, is target identification. It's not always helpful if the tank's armor starts trying to take out some unlucky pigeon, or radio flyer. When they first started putting this sort of things on ships, they wiped out a lot of porpuses, shot the tops off some waves, etc.

Re:Type of NIC and clustered file systems? on ILM's Datacenter · 2006-03-31 03:27 · Score: 1

Yes, fibre channel is almost always faster than gigE. Even 1gig fibre provides higher bandwidth than 1gig ethernet. The payload for each packet is a lot bigger than ethernet, even with jumbo frames. GigE, on the other hand, is cheap.

There's a reason noone offers local-node exported cluster filesystems in the commercial space: What happens when a node fails? How do you get at your data? You can do raid across the different nodes, well, then how do you reconstruct the raid? How does it behave in degraded mode? What happens if a second node fails? It's a very, very, very difficult problem to solve.

There was once a company that tried to do this called tricord. They blew through a hundred million bucks trying to develope such a thing and went belly up. I've seen a lot of other software companies go down that road and turn back. It's easier and cheaper to let a hardware raid do all that nonsense.

Re:Type of NIC and clustered file systems? on ILM's Datacenter · 2006-03-30 08:14 · Score: 1

a low latency interconnect is only needed if you're doing message passing between nodes, which ILM is not. There's a batch scheduling node that passes each render node a list of tasks it's gonna do, and then the traffic is mostly pulling texture maps from the fileserver (into a local cache) and outputting the rendered frame. No need for infinaband. multiple gigE rails might help though.

TCP offload sounds like a good idea, but I've seen it introduce a lot of bugs. It's also not terribly well supported on linux, at least not on a lot of cards.

A 5000 node san filesystem is very very large. While some of them claim to support this many nodes, few actually do, and the cost would be many millions of dollars. That's assuming iscsi, even more if you had actual san hardware.

ILM is not a movie studio on ILM's Datacenter · 2006-03-30 08:03 · Score: 1

ILM doesn't make movies. They only do special effects shots. Movie studios hire ILM to add effects shots into the movies. What they do with them, and how they cast/write/act/script the movies is really out of ILM's hands.

ILM used to have a digital movie group. Steve Jobs bought it, and it's now called pixar, err... Disney.

Re:Nice network on ILM's Datacenter · 2006-03-30 07:56 · Score: 2, Informative

I've tried bluearc. It works alright, though not as well as the whitepapers say. This is true of most everything, though. What really pissed me off about bluearc is the pre-sales engineers who seemed to have drunk a whole hell of a lot of the company coolaid. The whole story is that the filesystem is "implemented all in hardware", so it's really fast, and that should solve all your problems.

Well, I've been around the block enough times to know that no filesystem is actually implemented in hardware. They may have allocations done on the disk drives like Object-Disks, or they may have the transaction model built into the network protocol, but nobody is burning filesystem asics. If they are, I wouldn't buy one. Even local filesystems take decades to work out the bulk of the bugs. If you remember the EFS to XFS transition on irix, you know what I'm talking about.

The Bluearcs are fine, the people I had to deal with were just full of themselves. They seemed completely convinced they had come up with something that had never been done before, and that it was gonna knock my socks off. Except I'd been there, done that, and knew about how much it should cost. I've never installed a site with bluearc, but it's been a contender several times.

Re:Nice network on ILM's Datacenter · 2006-03-30 04:55 · Score: 5, Interesting

That's a funny question because I used to work at ILM's (San rafael, much less shiny) lab, benchmarking raids, including the first version of the IBM shark. At that time we came to the conclussion that the IBM raid was reliable, and reasonably fast, but the price was so far out of line, that it wasn't a real contender.

The shark, and many of the high-end raids, are really designed around transaction oriented applications (databases). ILM's application are classic video codes, which work better on a classic raid5, than they do on the data-sprinkler style raids like the shark, eva, clariion, etc. Netapp makes pretty decent storage boxes, and they're highly configurable, so I'm sure they have them fine tuned to the apps' preffered i/o size.

Furthermore, the nas/san has more to do with the spinaker software than the raid of choice. Back when I worked there, ILM was testing cluster sollutions, but the renderfarm was a bunch of sgi origins. The storage was hung off of a couple of 8-way irix boxes, and pushed around with NFS. Since then they've upped their compute capacity by a factor of 30, there's no way they'd be able to do all that I/O with NFS to a couple of big servers. The san setup lets them distribute the NFS load to a large number of servers, all sharing access to the storage on a san. A lot of other cluster filesystems allow this too.

From the benchmarking I've done of these types of storage clusters, you don't get the same single stream performance as you do from a big-iron server setup, but the aggregate across a large number of nodes is pretty good. Managing the mess, and reliability can be problematic. I've never used spinaker, but I've used almost all the other products in this space, and they're all in the "pretty good" category. My current favorite is apple's xsan, because it is really inexpensive, and so is the hardware.

Re:Buzz word. on Cray Introduces Adaptive Supercomputing · 2006-03-24 10:10 · Score: 2, Insightful

2 years behind in announcements, let's see who brings it to market first.

Sadly the answer is that it's not even a race. SGI brought foreward their first step already, but won't get past that. You can now buy an fpga blade to put in your altix. While cray is just now announcing a unified vision for this, they've already had their fpga based solution since they bought octiga bay 2 years ago.

As much as cray is suffering financially, SGI is in much worse shape, and they have about $350million in debt around their neck, which makes them an unlikely target for a buy-out, at least until they go through bankruptcy for a while. I doubt that SGI has any money to spend on long-term engineering efforts like a vector cpu. They hopped on the fpga bandwagon because they could buy them from xilinks, slap a numalink on them, and stuff them into an altix with relatively little investment. Thus far cray has had a great deal of luck porting bioinformatics codes to the fpga in the xd1. (smith-waterman alignment, if anyone cares.) This is a market much more in line with SGIs market strengths and somewhat new for cray, who is used to selling machines with an entry-level price of $2million.

In any case, it's the logical path foreward for Cray's 4 product lines, even if noone combines vector, fpga, and multithreaded processors. They all benefit from being paired with opteron nodes, and from reducing the number of parts cray has to maintain. SGI is coming from the other direction, which is to add processor types to their interconnect foundation. It's still a good idea, but it's probably more capital-intensive than what SGI's capable off these days.

Re:Complexity, current machines on Cray Introduces Adaptive Supercomputing · 2006-03-24 09:51 · Score: 2, Interesting

The X1 processor is already a coprocessor. Not in the sense that it's on a different piece of silicon from the scalar unit, but that the vector CPU's instruction stream is distinct from the scalar unit. In past cray systems, some cpu's used the same functional units for the scalar unit and vector unit, (T90) while some (J90) used distinct scalar units. The X1 is a vector unit bolted on the side of a MIPS scalar core, with synchronization logic, and multi-ported register files to support multi-streaming. I don't know what latency there is for the scalar unit reading/writing a vector register, but I can definately imagine a vector co-processor linked to an opteron with coherent-hypertransport. Maybe in black widow, rather than cascade. Cray has been cheering how much faster black widow will be at scalar codes, than the X1.

The trick, of course, is how do you get the opteron and the vector processor to share access to memory? No way does hypertransport have enough bandwidth to feed the vector unit. You don't want the scalar unit to have to read the memory through the hypertransport through the vector unit. Do you give them distinct memories that are connected in some form of numa?
The current X1 uses 32-channel rdram for 4 cpus. Assuming the black widow processor is twice as fast, and only 1 vector cpu per node: to provide the same bandwidth per flop, you need at least 12-channel xdr memory, or 4-channel xdr2. The opteron keeps going with dual channel ddr2, and has one hypertransport channel connecting the register files, one for numa memory, and one to talk to the seastar(s). Also, do the vector units and the scalar processors share the same interconnect controllers? You would want more than 1 seastar for each vector node, maybe 4.

Hmm. I'm sure there are technical hurdles a-plenty, but it sounds good on paper.

Re:Supercomputing v Distributed Computing on Cray Introduces Adaptive Supercomputing · 2006-03-24 09:19 · Score: 1

"ok, I shouldn't have said anything about processor speed. am I right in thinking that there's still no reason this adaptive approach will neccessarily be any better than a distributed project?"

REally 2 different tools for 2 different problems. Distributed computing is really only useful for highly parallel tasks that require a LOT of computation on very little raw data. Furthermore, it has to be on data you don't mind shipping off to joe anonymous to be processed.

Current and future crays are typically run on modestly parallel tasks that use an enormous amount of raw data, do a lot of communication between nodes, and output a ton of intermediate and output data. Some of the machines use regular opteron processors, but they have 3 custom processor designs for specific problem classes that do not perform well on a general-purpose cpu. The interconnect offer multiple-tens of gigaBytes of bandwidth per processor, and latencies of a dozen microseconds. The CPUs have tens or hundreds of gigabytes per second of memory bandwidth. Equally important is that most crays are sold to either the department of energy, the department of defense, or their subcontractors. These systems are most likely not on the same network with any computer that is also connected to the internet. The data is almost certaintly classified.

By comparison: Distributed computing has bandwidth measured in kilobits per second, and latency measured in seconds. It may offer a lot of cpu's for little or no cost, but that's not appropraite for the majority of high performance computing tasks.

Incidently. The distinctions I draw are not unique to Cray. A lot of other supercomputers share high-bandwidth low-latency, and high-security. SGI also has put out a fpga blade that can be plugged into their altix supercomputers, which is pretty much the same as what cray is working on here. Cray just has a few more types of custom processors.

Re:building machines around problems on Cray Introduces Adaptive Supercomputing · 2006-03-24 08:57 · Score: 1

That's not really true either.
You only buy as much hardware as you need, but hardware is only half the cost of a computer like that. Infrastructure (physical, hardware, administrative, and management), software, and planning is a big part of the cost. Every time you install a capability-class machine, you plan for it several years in advance, make space/power/cooling available for it. Hire people to manage the machine. Port your applications to the machine. Install the machine, test and benchmark. Then administer the beast. There are a lot of unhappy owners of cluster machines in the capability class.

Where cluster have been very successful is in capacity class machines. These are the machines I work with. They are typically small clusters of 10-80 2-4cpu machines. HP and IBM have made a killing in this part of the HPC market. They often run off-the-shelf software like nastran or some-such. They are often not bought specifically designed for a problem, but are cookie-cutter purchases. They're cheap, so it doesn't matter if the machine is not 100% utilized. Many of my clients run more than one such cluster.

The part of the market to which cray sells computers is where the cluster paradigm has been only modestly successful. (not as powerful/efficient as promised, more expensive to buy/own/maintain than origionally anticipated) Clusters are definately worth considering in thiat space, and there are several vendors trying to sell clusters of that level. However, the cluster story is not so overwhelmingly powerful, as to eliminate full-on supercomputers from consideration.

Re:And What If... on Cray Introduces Adaptive Supercomputing · 2006-03-24 08:24 · Score: 3, Insightful

The idea is that all the CPU types will be blades that all use the same router, and plug into a common backplane, and that the cabinets all cable together the same way. In all cases, I imagine there will be opterons around the periphery, as I/O nodes and running the operating system. Then you plug in compute nodes in the middle, where the computer nodes can be a bunch more opterons, or vector cpu's, or fpga's, or multithreaded cpus. There will certaintly be plenty of customers only interested in lotsa opterons on cray's fast interconnect, and they just won't buy any of the custom cpus.

Slashdot Mirror

User: flaming-opus

Comments · 368