They used to, and the X1 still holds true to that. If you take the skins off, it is a marvel of stainless steel, plumbing, and just plain fantastic mechanical engineering. The Xt3 and mta, however, are just more rectangular racks. The xd1 is just a dull 3u rackmount.
They really aren't rellying on compiler improvements, so much as passing the code through their vectorizing compiler, and a tool for generating their fpga codes. If the code optimization for these 2 steps fails to optimize very much, you bail out and send it to the general purpose (opteron) processors.
Your being fairly pedantic about the computer architecture anyway. Yes, pairing multipe processor types together is not new, but most mpp supercomputers use identical node types.
The jist of this story is simpler than it sounds. Cray has 4 product lines with 4 cpu types, 4 interconnect routers, 4 cabinet types, and 4 operating systems. They would like to condense this down. The first step is to reuse components from one machine to the next. There are distinct advantages for keeping the 4 cpu types for various problem sets, but most everything else could be multi-purpose. From the sounds of things, it's using the next generation of the seastar router in all of the machines. Thus you use the same router chips, cabling, backplane, and frame for all the products. This reduces the number of unique components cray has to worry about. If they go to DDR2 memory on the X1 and mta, that further simplifies things, though I suspect they won't.
Well, once you share parts, why not make a frame with a bunch of general purpose CPUs for unoptimized codes, and a few fpga or vector cpus for the highly optimized codes? It allows customers more flexibility, and introduces cray's mid-range customers to the possibility of using the really high-end vector processors currently reserved for the high-end X1 systems. It's also a win for the current high-end customers. On the current X1 systems, you have these very elaborate processors running the user's optimized application, but the vector cpu's also end up running scalar codes like utilities and the operating system. These are tasks the vector cpu's aren't terribly good at, and you're using a $40,000 processor to run tasks a $1000 opteron will do better. Even if the customer isn't interested in mix-n-match codes on the system, (which I'm skeptical any cray customer really is), you probably want to throw a few dozen opteron nodes into the X1's successor, just to handle the OS, filesystems, networking, and the batch scheduler.
yeah, except substitute 1000 disks with 10,000 disks. They almost certaintly are stiping across a bunch of mid-range IBM raids, each with ~100 disks, and probably getting around 1-2 GB/s.
It's also striping across many machines in a cluster. Each of those nodes maxes out at 'only' 15 GB/s of I/O, so they wire up all the nodes to a bunch of fibre channel cards, and plug them all into the raids, to distribute the I/O access to the nodes. GPFS also lets you do the I/O over the cluster interconnect, but then your interconnect bandwidth usable by the application has to compete with the filesystem traffic.
As for coordinating all the parallelism, there's a metadata node (actually a failover pair of nodes) that does the metadata operations (create, rename, remove, link) and each cluster node does file I/O directly to disk. Typically, each of the nodes write to seperate files , to avoid having to do concurrent I/O. You can have all the nodes write to different byte ranges within the same file, but you have to use special flags to enable this, and the application has to written to legitimately write to very distant parts of the file. Often it's simplest just to write to different scratch files for intermediate results, and then combind the output at the end of the run.
GPFS is one of the more entrenched parallel cluster filesystems available. (others include the classic vax cluster fs, Tru64 cfs, redhat gfs, adic stornext, lustre, Sanergy, polyserve, others) GPFS has been running on IBM's high performance clusters for a decade or more. I've used it, and it's as robust as any of the others I listed above.
I'll caution everyone that you can get 100GB/s of throughput, only if you have a hundred million dollar collection of computers and disks like Livermore has.
It's not at all an unreasonable claim either. The usaf has been researching skip bombers and other space planes since the early 60's. I would be disappointed with the feds if they didn't do at least some research in the area. They then shelved it because the military could get the real work of the system done in other, less-expensive ways. It's a damn nuisance that a B-2 has to fly for hours, refuel a few times just to get from the midwest to the midle-east, drop some bombs and then go home. However, it's probably cheaper than the rest of the work necessary to put this space plane above Basra in 40 minutes.
You're right that satelie launch can't justify this. Otherwise they would be strapping a pegasus rocket under the lancher, and letting it burn up on reentry. There is no doubt a reason to want the plane to shoot out on the other end of the flight and do something.
Which is all well and good, except that the cell architecture gives the spe's really fast access to a very small pool of memory, and then a decent, but not astounding access to the rest of memory through the master core. This works very well for doing a lot of number crunching on a small bit of digital media data. However, there are a lot of supercomputing problems that are bounded by the amount of memory addressable by each processor. Even Blue Gene's quarter gig per node is a problem for many codes.
If the successor to blue gene runs with some cell-like processor, which I suspect it will, they'll have to do more than just drop PS3 cells onto a board and run with it. There will have to be some reballancing of the internal/external memory bandwidth.
I'm sorry. I got that wrong. It's not the fujitsu vpp (which is no longer being made), but rather the hitachi sr11000, which gangs together a bunch of power4's into a vector processor.
First you would have to define what you mean by "single processor". The fastest addressable processor is probably fujitsu's VPP series node, but it's not really one processor, it's 8 IBM power4+ processors ganged together with a vector coupling facility. The fastest hpc processor that sits on a single chip is the cray X1e msp chip, which has 2 18GFlop cores per chip. This of course ignores specialized chips like DSPs, most notably the IBM cell chip. Cell has a much higher flops performance than any chips used in HPC systems, but it's difficult to call it a single processor, as it runs from 9 instruction streams at once, and only does 32bit math anyway.
If you're asking what the fastest 64bit, superscallar microprocessor is, for general floating point codes, it's probably itanium, but maybe power5, depending on what you're doing, and how much optimization you've done.
Yes the linpack performance is better, but I think 38tf of Earth simulator is likely to get a lot more real work done than 57tf of blue gene. The blue gene is a really elegant design for creating a very inexpensive supercomputer, which means one can affordably buy a very large system. However, the balance of the system is a little weak in terms of memory bandwidth, and interconnect bandwidth, as compared to the Earth simulator.
Furthermore, scaling most codes to the tens of thousands of processors of a blue gene is very difficult. Most jobs on the earth simulator, and other large systems, run on a few dozen processors, as the algorithms become communication bound above that number. Algorithm designers are slowly improving this bottleneck, but it's something that must be done anew for each piece of code.
Not to say that this isn't a fantastic machine, and one offered at a very modest price for the capability. I simply object to the title of "Japan's fastest supercomputer".
My information is out of date? Well, that's also the data Apple was using when deciding what CPU to use in their newest offerings. While the slashdork crowd might use up-to-the-minute reviews and speculation to drive their purchasing decissions, Apple takes a more measured route. While we think of Apple as being very daring compared to other computer companies, it's also in the business of making money, and switching architectures is a major commitment, which has to pan out over several years.
Lets look at the example of Dell. Dell is nothing if not good at making money. We can make fun of Dell all we want for not using opteron, a clearly superior server chip, as compared to Xeon. However, Dell is not in the business of selling processors. They are selling server solutions, and they are selling a brand. Apple is less conservative than Dell, but they want a decission that will drive their products for at least 2 generations of iMacs. They can reexamine AMD at a later date, but they obviously feel that Intel can offer them a compelling processors for very light laptops, for at least 4 years. Intel has a strong history here.
Furthermore, all of the benchmarks I've seen, seem to indicate that CoreDou is a very nice processor.
There are no AMD macs for a reason, a good reason. Apple had a fine-and-dandy desktop processor. The G5 works very well in their workstation-class machines. The real problem was the laptops. Half+ of the macs sold are laptops.
AMD competes very well in the desktop processor arena, and pretty well in the desktop-replacement notebook segment. However, they don't have a great answer in the thin-n'-light laptop segment. This is an area dominated by intel. Choosing to make AMD-based macs would not have improved Apple's position relative to the power architecture. They would still have good desktop systems, and still would struggle with laptops, and minis.
While I'm sure the marketing dollars are nice, I think that's not the primary reason for Apple to choose Intel over AMD.
Why do we want the public to be aware of what type of CPU is inside the box? Doesn't that go against the last 15 years of developments in the personal computer evolution? Isn't it good that the consumer doesn't need to be bothered with this sort of thing? I think it's a possitive thing that buying a PC, except for the enthusiast or yuppie-gamer, is more and more like buying a stereo, or a microwave.
Apple has done a good job of telling the consumer: You aren't buying a central processing unit linked to a primary memory unit, linked to a magnetic and optical secondary storage controller, and a 3-dimensional rasterizing coprocessor. Rather, you are buying a mac. You are not a systems administrator buying a piece of industrial equipment, you are a consumer buying a gadget.
I don't think it is our role, as computer people, to make the rest of the world more computer savvy, as we shall fail in this. Rather it is our job to make computers more usable, such that the consumer need not be more savvy. Apple has done a good job of creating usable software, and it's only natural that their marketing should mirror this philosophy. Should we question outrageous claims of what a mac can do, that other PCs can't, or unrealistic claims of how a mac is better than a PC? Sure we should, as we should be skeptic consumers of all marketing. But I think it's good that Apple sells computers the same way that Levi's sells Jeans.
Absolutely. Disk drive are cheap, fast, and relatively durable.
I work in the data archive field, and we don't see optical jukeboxes anymore. I think HP still makes one, but everyone else is out of that market. The preferred method is high-speed tape, but there's an entry cost for a low end changer (about $10,000) that makes it prohibitive for desktop users. second disks are a fantastic way to back up data, and you're seeing that even in the enterprise space. IT can't compete with tape in GB/$, or in some of the archival automation, but it's getting close.
The important thing with disk, just as it is with tape and with optical, is to make AT LEAST 2 backups, and to store them in a different place. I don't know how many data centers I've walked into where the tape library is sitting in the same rack as the raid, and they don't use the vaulting features. Yeah, you're protected against a disk failing, except if the failure is in a fire, or a flood.
If you care about your data, get a three drives, a safety deposite box, and a firebox.
You're absolutely correct, as free-space fragmentation can play a HUGE role in the speed of space allocation. Of course, this plays no role at all in stat, rename, remove, readdir, operations, or any reads or any writes to existing parts of files.
Since the benchmarks presented are so rudimentary anyway, this is maybe not the first thing to worry about.
except you don't want to do this. As disks approach full, the contigious stretches of free-space approach lenght zero, due to fragmentation. This is true on all filesystems. The result of this is that space allocation on a 98% full disk is much much slower than on a 2% full disk. With disks as cheap as they are, one shouldn't be sitting around with 95% full disks. If that's the case, there are work-flow/administration issues that need to be worked out, rather than unlocking that last little bit of space.
As I recall, the default on xfs for irix was to reserve the top 10% for root only.
You raise an excellent point, though the only way to get ahold of enough hardware to make that test interesting is to get the system vendor to provide the hardware, in which case you often have limited ability to publish any results they don't like. (Been there, didn't publish that)
Furthermore, once you get into that high-end of a system, you're generally not all that interested in "general purpose" benchmarks. I have a lot of experience benchmarking filesystems on high-end systems. (15GBytes/s and so on) In those cases you're benchmarking everything: the application, the filesystem, the filesystem settings, the operating system, the OS settings, multipathing drivers, san environments, raid controllers, down to even the disk drives in the raids. It's hard to isolate the filesystem from this mess, except in the performance of the particular application.
In a sense, generic benchmarks only make sense on small servers and workstations, as you run a diverse set of applications, and have a limited set of hardware, that changes only modestly with time (though 500mhz is getting pretty antique there dude). Benchmarking a dual 2.4 ghz dell slab with a a mirrored pair of 10k scsi drives might be a little more useful, as there are a LOT those out there running linux. Benchmark mail-serving, web-serving, file-serving. Since these are the sweet-spot for linux servers, benchmarking these things would probably be most instructive to the broadest group of people. The microbenchmarks Mr. Piszcz runs are a little too workstation-like for my tastes. I don't consider workstation disk performance to be all that important, at least compared to server tasks.
No they won't! They have no reason to. The vector units that a cray uses aren't like altivec, sse, or other "bolt-on" vector units. The vector unit on a cray (or NEC) is a latency hiding mechanism. It's a method for forcing the programmer/compiler to structure the code such that the data loaded from memory is used a significant period of time after the load is initiated. This works pretty well on the HPC code that is used on crays, but not at all for the everyday server/workstation code that opterons run. Furthermore, to support that sort of vector unit, you need to have about eight times as much memory bandwidth as an opteron, which means many more pins on the socket, which are very expensive.
I think you're much more likely to see the cray vector processor retooled with lots of hypertransport connections, so it can use an opteron as its scalar unit, and use the same seastar routers that the xt3 uses. On the X1, the scalar unit already runs ahead of the vector unit, so I bet it's not all that important for the scalar unit to be on-die.
Of course, cray didn't invent the 6400 either. They bought out a competitor in bankruptcy. Sun, at that time, wasn't quite ready to go head-to-head with s/390s, which is essentially what they did when they finally got around to selling the s10000 (same computer, new name). Chipset is one key, but software is an even bigger key.
Close. Craylink was designed at SGI, and renamed to craylink after they bought Cray. They introduced craylink in the origin2000, which they started selling half a year after buying cray, so I'm sure they couldn't have integrated any cray-designs into their product in that span.
After they sold Cray to Tera, SGI started calling the technology Numalink, and currently use it in their origin3, altix3, and altix4 product lines. They are on the 4th generation of the technology, which is 3.2GB/s per direction. The cray that was sold to Tera included the half-finished X1 system, which also uses numalink. It uses the older 1.6GBps/dir links, but uses 32 networks in parallel for a total of ~50GB/s/dir per node.
The Cray XT3 uses a newer network interconnect called seastar, which offers 3.8GBps/direction. This is probably what will be used in the X1's successor.
The Cray XD1, which your colleague bought, is a product cray acquired when they bought OctigaBay. They use an interconnect called the RappidArray switch, which provides 4GBps/direction of interconnect.
All of these interconnects are high-bandwidth and low latency. The XD1, is also very inexpensive for a cray, which is always nice.
Yes True. And NEC has done a great job of doing this for the last several generations of their vector machines. I have not ever programmed for an SX, and don't know much about them. The really nice thing about the X1, is that under the covers it's running Irix, which is a pretty reasonable Unix variant. Anyone know anything about super/UX?
Which is true only insomuch as the old-time reputation could not possibly exist today. That was the cold war, this is the post-coldwar era. The old cray was a mammoth beast with its own share of myopia, but a lot of technical tallent. This allowed a few really brilliant concepts, and a lot of clever implementation to power two decades of brilliant computers. That said, they were brilliant solutions for the era. Old-school cray systems were built in an era when doing fundamental pieces of math was still pretty difficult, and the government was willing to pay ten million dollars for a machine that was proficient at doing math, and many tens of millions for a machine that was really good at it.
The difficult problems in building computers has changed, and the financial climate around supercomputers has changed quite a lot. Among other things, CMOS finally became fast enough to put bipolar in its grave, single microprocessor workstations became powerful enough to do all but the hardest of scientific tasks, and the average price of high performance (not top 10 on the list, but still fast) computers has plummeted. To ask the new cray to be like the old cray would be foolish.
That said, New Cray is still offers impressive products. All of Cray's 3 product lines have much lower entry-prices than similar crays of the 90's. They all have more managable power/cooling/physical size characteristics. They make much greater use of industry standard Disks and networks, and also can be administered and programmed much more like any other unix computer. You program a New Cray more or less the same as other contemporary HPC systems.
When cdc introduced the 6600, the president of IBM complained to his staff aking (paraphrase) 'how has cdc managed to best IBM's fastest computer with a staff of just 14 engineers and 4 programmers?' Seymor Cray responded "It seems like Mr. Watson has answered his own question." Because new Cray is tiny does not mean that it is not capable of making impressive innovations. Old Cray's Gorilla days were very wasteful, and not necessarily full of the best moments of innovation.
Now, if only they could put four X1e CPUs into an air-cooled, rack-mount server and charge a reasonable amount for it. I'd much rather have a handful of vector processors than a few dozen opterons, anyday.
Apple is more than a computer company. Ride the train someday and count the little-white earbuds. Ipods are becoming as ubiquitous as cell phones. While IBM, HP, and Microsoft are turning microchips and source code into corporate/industrial tools (you know, exciting, like a forklift, or a conference room), Apple has been turning chips and code into a lifestyle. How many people buy magazines about fashion, about sexy-looking cars, about rock and role? Next Question: how many people buy magazines about high efficiency diesel generators?
It takes a lot more than some core logic chips to build a big server, and takes more than a big server to acquire any market share. Look at the industry. Sun still dominates the Unix server market with expensive machines full of slow processors. Mainframes are still a billion dollar business.
While I'm excited to see the possibility, you're not going to get anyone to spend this much money on a 64-way opteron box until they have been on the market for years, have been tested, and tried, and have lots of software vendors lining up behind them. Don't believe me, then ask why unisys doesn't have more market share. Substitute xeon for opteron, and that's basically what they've been trying to sell for the last six years. The only thing opteron has that the es7000 doesn't, is market hype.
The real change is the reduced number of people who need 32-way boxes, or any sort. How many people are really outgrowing an 8-core opteron/xeon box? That number keeps shrinking.
They used to, and the X1 still holds true to that. If you take the skins off, it is a marvel of stainless steel, plumbing, and just plain fantastic mechanical engineering. The Xt3 and mta, however, are just more rectangular racks. The xd1 is just a dull 3u rackmount.
They really aren't rellying on compiler improvements, so much as passing the code through their vectorizing compiler, and a tool for generating their fpga codes. If the code optimization for these 2 steps fails to optimize very much, you bail out and send it to the general purpose (opteron) processors.
Your being fairly pedantic about the computer architecture anyway. Yes, pairing multipe processor types together is not new, but most mpp supercomputers use identical node types.
The jist of this story is simpler than it sounds. Cray has 4 product lines with 4 cpu types, 4 interconnect routers, 4 cabinet types, and 4 operating systems. They would like to condense this down. The first step is to reuse components from one machine to the next. There are distinct advantages for keeping the 4 cpu types for various problem sets, but most everything else could be multi-purpose. From the sounds of things, it's using the next generation of the seastar router in all of the machines. Thus you use the same router chips, cabling, backplane, and frame for all the products. This reduces the number of unique components cray has to worry about. If they go to DDR2 memory on the X1 and mta, that further simplifies things, though I suspect they won't.
Well, once you share parts, why not make a frame with a bunch of general purpose CPUs for unoptimized codes, and a few fpga or vector cpus for the highly optimized codes? It allows customers more flexibility, and introduces cray's mid-range customers to the possibility of using the really high-end vector processors currently reserved for the high-end X1 systems. It's also a win for the current high-end customers. On the current X1 systems, you have these very elaborate processors running the user's optimized application, but the vector cpu's also end up running scalar codes like utilities and the operating system. These are tasks the vector cpu's aren't terribly good at, and you're using a $40,000 processor to run tasks a $1000 opteron will do better. Even if the customer isn't interested in mix-n-match codes on the system, (which I'm skeptical any cray customer really is), you probably want to throw a few dozen opteron nodes into the X1's successor, just to handle the OS, filesystems, networking, and the batch scheduler.
yeah, except substitute 1000 disks with 10,000 disks. They almost certaintly are stiping across a bunch of mid-range IBM raids, each with ~100 disks, and probably getting around 1-2 GB/s.
It's also striping across many machines in a cluster. Each of those nodes maxes out at 'only' 15 GB/s of I/O, so they wire up all the nodes to a bunch of fibre channel cards, and plug them all into the raids, to distribute the I/O access to the nodes. GPFS also lets you do the I/O over the cluster interconnect, but then your interconnect bandwidth usable by the application has to compete with the filesystem traffic.
As for coordinating all the parallelism, there's a metadata node (actually a failover pair of nodes) that does the metadata operations (create, rename, remove, link) and each cluster node does file I/O directly to disk. Typically, each of the nodes write to seperate files , to avoid having to do concurrent I/O. You can have all the nodes write to different byte ranges within the same file, but you have to use special flags to enable this, and the application has to written to legitimately write to very distant parts of the file. Often it's simplest just to write to different scratch files for intermediate results, and then combind the output at the end of the run.
GPFS is one of the more entrenched parallel cluster filesystems available. (others include the classic vax cluster fs, Tru64 cfs, redhat gfs, adic stornext, lustre, Sanergy, polyserve, others) GPFS has been running on IBM's high performance clusters for a decade or more. I've used it, and it's as robust as any of the others I listed above.
I'll caution everyone that you can get 100GB/s of throughput, only if you have a hundred million dollar collection of computers and disks like Livermore has.
It's not at all an unreasonable claim either. The usaf has been researching skip bombers and other space planes since the early 60's. I would be disappointed with the feds if they didn't do at least some research in the area. They then shelved it because the military could get the real work of the system done in other, less-expensive ways. It's a damn nuisance that a B-2 has to fly for hours, refuel a few times just to get from the midwest to the midle-east, drop some bombs and then go home. However, it's probably cheaper than the rest of the work necessary to put this space plane above Basra in 40 minutes.
You're right that satelie launch can't justify this. Otherwise they would be strapping a pegasus rocket under the lancher, and letting it burn up on reentry. There is no doubt a reason to want the plane to shoot out on the other end of the flight and do something.
Which is all well and good, except that the cell architecture gives the spe's really fast access to a very small pool of memory, and then a decent, but not astounding access to the rest of memory through the master core. This works very well for doing a lot of number crunching on a small bit of digital media data. However, there are a lot of supercomputing problems that are bounded by the amount of memory addressable by each processor. Even Blue Gene's quarter gig per node is a problem for many codes.
If the successor to blue gene runs with some cell-like processor, which I suspect it will, they'll have to do more than just drop PS3 cells onto a board and run with it. There will have to be some reballancing of the internal/external memory bandwidth.
I'm sorry. I got that wrong. It's not the fujitsu vpp (which is no longer being made), but rather the hitachi sr11000, which gangs together a bunch of power4's into a vector processor.
First you would have to define what you mean by "single processor". The fastest addressable processor is probably fujitsu's VPP series node, but it's not really one processor, it's 8 IBM power4+ processors ganged together with a vector coupling facility. The fastest hpc processor that sits on a single chip is the cray X1e msp chip, which has 2 18GFlop cores per chip. This of course ignores specialized chips like DSPs, most notably the IBM cell chip. Cell has a much higher flops performance than any chips used in HPC systems, but it's difficult to call it a single processor, as it runs from 9 instruction streams at once, and only does 32bit math anyway.
If you're asking what the fastest 64bit, superscallar microprocessor is, for general floating point codes, it's probably itanium, but maybe power5, depending on what you're doing, and how much optimization you've done.
Yes the linpack performance is better, but I think 38tf of Earth simulator is likely to get a lot more real work done than 57tf of blue gene. The blue gene is a really elegant design for creating a very inexpensive supercomputer, which means one can affordably buy a very large system. However, the balance of the system is a little weak in terms of memory bandwidth, and interconnect bandwidth, as compared to the Earth simulator.
Furthermore, scaling most codes to the tens of thousands of processors of a blue gene is very difficult. Most jobs on the earth simulator, and other large systems, run on a few dozen processors, as the algorithms become communication bound above that number. Algorithm designers are slowly improving this bottleneck, but it's something that must be done anew for each piece of code.
Not to say that this isn't a fantastic machine, and one offered at a very modest price for the capability. I simply object to the title of "Japan's fastest supercomputer".
My information is out of date? Well, that's also the data Apple was using when deciding what CPU to use in their newest offerings. While the slashdork crowd might use up-to-the-minute reviews and speculation to drive their purchasing decissions, Apple takes a more measured route. While we think of Apple as being very daring compared to other computer companies, it's also in the business of making money, and switching architectures is a major commitment, which has to pan out over several years.
Lets look at the example of Dell. Dell is nothing if not good at making money. We can make fun of Dell all we want for not using opteron, a clearly superior server chip, as compared to Xeon. However, Dell is not in the business of selling processors. They are selling server solutions, and they are selling a brand. Apple is less conservative than Dell, but they want a decission that will drive their products for at least 2 generations of iMacs. They can reexamine AMD at a later date, but they obviously feel that Intel can offer them a compelling processors for very light laptops, for at least 4 years. Intel has a strong history here.
Furthermore, all of the benchmarks I've seen, seem to indicate that CoreDou is a very nice processor.
There are no AMD macs for a reason, a good reason.
Apple had a fine-and-dandy desktop processor. The G5 works very well in their workstation-class machines. The real problem was the laptops. Half+ of the macs sold are laptops.
AMD competes very well in the desktop processor arena, and pretty well in the desktop-replacement notebook segment. However, they don't have a great answer in the thin-n'-light laptop segment. This is an area dominated by intel. Choosing to make AMD-based macs would not have improved Apple's position relative to the power architecture. They would still have good desktop systems, and still would struggle with laptops, and minis.
While I'm sure the marketing dollars are nice, I think that's not the primary reason for Apple to choose Intel over AMD.
Why do we want the public to be aware of what type of CPU is inside the box? Doesn't that go against the last 15 years of developments in the personal computer evolution? Isn't it good that the consumer doesn't need to be bothered with this sort of thing? I think it's a possitive thing that buying a PC, except for the enthusiast or yuppie-gamer, is more and more like buying a stereo, or a microwave.
Apple has done a good job of telling the consumer: You aren't buying a central processing unit linked to a primary memory unit, linked to a magnetic and optical secondary storage controller, and a 3-dimensional rasterizing coprocessor. Rather, you are buying a mac.
You are not a systems administrator buying a piece of industrial equipment, you are a consumer buying a gadget.
I don't think it is our role, as computer people, to make the rest of the world more computer savvy, as we shall fail in this. Rather it is our job to make computers more usable, such that the consumer need not be more savvy. Apple has done a good job of creating usable software, and it's only natural that their marketing should mirror this philosophy. Should we question outrageous claims of what a mac can do, that other PCs can't, or unrealistic claims of how a mac is better than a PC? Sure we should, as we should be skeptic consumers of all marketing. But I think it's good that Apple sells computers the same way that Levi's sells Jeans.
Absolutely.
Disk drive are cheap, fast, and relatively durable.
I work in the data archive field, and we don't see optical jukeboxes anymore. I think HP still makes one, but everyone else is out of that market. The preferred method is high-speed tape, but there's an entry cost for a low end changer (about $10,000) that makes it prohibitive for desktop users. second disks are a fantastic way to back up data, and you're seeing that even in the enterprise space. IT can't compete with tape in GB/$, or in some of the archival automation, but it's getting close.
The important thing with disk, just as it is with tape and with optical, is to make AT LEAST 2 backups, and to store them in a different place. I don't know how many data centers I've walked into where the tape library is sitting in the same rack as the raid, and they don't use the vaulting features. Yeah, you're protected against a disk failing, except if the failure is in a fire, or a flood.
If you care about your data, get a three drives, a safety deposite box, and a firebox.
You're absolutely correct, as free-space fragmentation can play a HUGE role in the speed of space allocation. Of course, this plays no role at all in stat, rename, remove, readdir, operations, or any reads or any writes to existing parts of files.
Since the benchmarks presented are so rudimentary anyway, this is maybe not the first thing to worry about.
except you don't want to do this. As disks approach full, the contigious stretches of free-space approach lenght zero, due to fragmentation. This is true on all filesystems. The result of this is that space allocation on a 98% full disk is much much slower than on a 2% full disk. With disks as cheap as they are, one shouldn't be sitting around with 95% full disks. If that's the case, there are work-flow/administration issues that need to be worked out, rather than unlocking that last little bit of space.
As I recall, the default on xfs for irix was to reserve the top 10% for root only.
You raise an excellent point, though the only way to get ahold of enough hardware to make that test interesting is to get the system vendor to provide the hardware, in which case you often have limited ability to publish any results they don't like. (Been there, didn't publish that)
Furthermore, once you get into that high-end of a system, you're generally not all that interested in "general purpose" benchmarks. I have a lot of experience benchmarking filesystems on high-end systems. (15GBytes/s and so on) In those cases you're benchmarking everything: the application, the filesystem, the filesystem settings, the operating system, the OS settings, multipathing drivers, san environments, raid controllers, down to even the disk drives in the raids. It's hard to isolate the filesystem from this mess, except in the performance of the particular application.
In a sense, generic benchmarks only make sense on small servers and workstations, as you run a diverse set of applications, and have a limited set of hardware, that changes only modestly with time (though 500mhz is getting pretty antique there dude). Benchmarking a dual 2.4 ghz dell slab with a a mirrored pair of 10k scsi drives might be a little more useful, as there are a LOT those out there running linux. Benchmark mail-serving, web-serving, file-serving. Since these are the sweet-spot for linux servers, benchmarking these things would probably be most instructive to the broadest group of people. The microbenchmarks Mr. Piszcz runs are a little too workstation-like for my tastes. I don't consider workstation disk performance to be all that important, at least compared to server tasks.
No they won't! They have no reason to. The vector units that a cray uses aren't like altivec, sse, or other "bolt-on" vector units. The vector unit on a cray (or NEC) is a latency hiding mechanism. It's a method for forcing the programmer/compiler to structure the code such that the data loaded from memory is used a significant period of time after the load is initiated. This works pretty well on the HPC code that is used on crays, but not at all for the everyday server/workstation code that opterons run. Furthermore, to support that sort of vector unit, you need to have about eight times as much memory bandwidth as an opteron, which means many more pins on the socket, which are very expensive.
I think you're much more likely to see the cray vector processor retooled with lots of hypertransport connections, so it can use an opteron as its scalar unit, and use the same seastar routers that the xt3 uses. On the X1, the scalar unit already runs ahead of the vector unit, so I bet it's not all that important for the scalar unit to be on-die.
Of course, cray didn't invent the 6400 either. They bought out a competitor in bankruptcy. Sun, at that time, wasn't quite ready to go head-to-head with s/390s, which is essentially what they did when they finally got around to selling the s10000 (same computer, new name). Chipset is one key, but software is an even bigger key.
Close.
Craylink was designed at SGI, and renamed to craylink after they bought Cray. They introduced craylink in the origin2000, which they started selling half a year after buying cray, so I'm sure they couldn't have integrated any cray-designs into their product in that span.
After they sold Cray to Tera, SGI started calling the technology Numalink, and currently use it in their origin3, altix3, and altix4 product lines. They are on the 4th generation of the technology, which is 3.2GB/s per direction. The cray that was sold to Tera included the half-finished X1 system, which also uses numalink. It uses the older 1.6GBps/dir links, but uses 32 networks in parallel for a total of ~50GB/s/dir per node.
The Cray XT3 uses a newer network interconnect called seastar, which offers 3.8GBps/direction. This is probably what will be used in the X1's successor.
The Cray XD1, which your colleague bought, is a product cray acquired when they bought OctigaBay. They use an interconnect called the RappidArray switch, which provides 4GBps/direction of interconnect.
All of these interconnects are high-bandwidth and low latency. The XD1, is also very inexpensive for a cray, which is always nice.
Yes True.
And NEC has done a great job of doing this for the last several generations of their vector machines. I have not ever programmed for an SX, and don't know much about them. The really nice thing about the X1, is that under the covers it's running Irix, which is a pretty reasonable Unix variant. Anyone know anything about super/UX?
Which is true only insomuch as the old-time reputation could not possibly exist today. That was the cold war, this is the post-coldwar era. The old cray was a mammoth beast with its own share of myopia, but a lot of technical tallent. This allowed a few really brilliant concepts, and a lot of clever implementation to power two decades of brilliant computers. That said, they were brilliant solutions for the era. Old-school cray systems were built in an era when doing fundamental pieces of math was still pretty difficult, and the government was willing to pay ten million dollars for a machine that was proficient at doing math, and many tens of millions for a machine that was really good at it.
The difficult problems in building computers has changed, and the financial climate around supercomputers has changed quite a lot. Among other things, CMOS finally became fast enough to put bipolar in its grave, single microprocessor workstations became powerful enough to do all but the hardest of scientific tasks, and the average price of high performance (not top 10 on the list, but still fast) computers has plummeted. To ask the new cray to be like the old cray would be foolish.
That said, New Cray is still offers impressive products. All of Cray's 3 product lines have much lower entry-prices than similar crays of the 90's. They all have more managable power/cooling/physical size characteristics. They make much greater use of industry standard Disks and networks, and also can be administered and programmed much more like any other unix computer. You program a New Cray more or less the same as other contemporary HPC systems.
When cdc introduced the 6600, the president of IBM complained to his staff aking (paraphrase) 'how has cdc managed to best IBM's fastest computer with a staff of just 14 engineers and 4 programmers?' Seymor Cray responded "It seems like Mr. Watson has answered his own question." Because new Cray is tiny does not mean that it is not capable of making impressive innovations. Old Cray's Gorilla days were very wasteful, and not necessarily full of the best moments of innovation.
Now, if only they could put four X1e CPUs into an air-cooled, rack-mount server and charge a reasonable amount for it. I'd much rather have a handful of vector processors than a few dozen opterons, anyday.
Cray is a small company.
They probably hire an outside comunications firm to do public relations.
Apple is more than a computer company. Ride the train someday and count the little-white earbuds. Ipods are becoming as ubiquitous as cell phones. While IBM, HP, and Microsoft are turning microchips and source code into corporate/industrial tools (you know, exciting, like a forklift, or a conference room), Apple has been turning chips and code into a lifestyle. How many people buy magazines about fashion, about sexy-looking cars, about rock and role? Next Question: how many people buy magazines about high efficiency diesel generators?
Gee, I wonder why apple gets some attention.
It takes a lot more than some core logic chips to build a big server, and takes more than a big server to acquire any market share. Look at the industry. Sun still dominates the Unix server market with expensive machines full of slow processors. Mainframes are still a billion dollar business.
While I'm excited to see the possibility, you're not going to get anyone to spend this much money on a 64-way opteron box until they have been on the market for years, have been tested, and tried, and have lots of software vendors lining up behind them. Don't believe me, then ask why unisys doesn't have more market share. Substitute xeon for opteron, and that's basically what they've been trying to sell for the last six years. The only thing opteron has that the es7000 doesn't, is market hype.
The real change is the reduced number of people who need 32-way boxes, or any sort. How many people are really outgrowing an 8-core opteron/xeon box? That number keeps shrinking.