Domain: myri.com
Stories and comments across the archive that link to myri.com.
Comments · 39
-
Re:Channel bonding
The parent is correct. Here is one example of a 10GbE card transmitting at wire speed (9.7 - 9.9 Gbps): http://www.myri.com/scs/performance/Myri10GE/
The "5 Gbps" bottleneck mentioned by the grand parent is due to 10GbE NICs often being installed on a relatively slow 100MHz PCI-X bus, whose practical bandwidth is only: 100 (MT/s) * 64 (bits) * 0.8 (efficiency of a PCI bus) ~= 5.1 Gbps. Fully exploiting the throughput of 10GbE requires at minimum a (1) PCI-X 2.0 266MHz+, or (2) x8 PCI Express 1.0, or (3) x4 PCI Express 2.0 NIC.
-
Re:Ethernet speed vs. PCI/PCI-X/PCIe speds
You mean like Myricom
http://www.myri.com/Myri-10G/10gbe_solutions.html
Not bad prices, list is 795USD for a fibre optic card and 500USD for a SR optic or 900USD for a LR optic. A CX4 card is only 695USD. With switches like the HP Procurve 2900 having 10GbE CX4 as standard, I predict that 2007 is the year when 10GbE really moves mainstream. -
Re:Irresistable
"For those with a short memory, NEC is a "baby bell" (an AT&T spin-off)."
BZZT! NEC http://www.nec.com/ is Nippon Electric Corporation, an immense japanese conglomerate founded in 1899. You're probably thinking of NCR, which was swallowed, never digested, and subsequently regurgitated by AT&T.
Your third real option is probably Fujitsu or IBM. The issue isn't just the interconnects, as you can buy those from Myricom http://www.myri.com/ or Quadrics.http://www.quadrics.com/ PNNL did this for their monstrous Itanium-2 system. It's also memory bandwidth, disk throughput, and that some jobs really require vector processors.
What Microsoft brings to the table, beyond incompatibility, overhead, and confusion, is really beyond me at the moment. They have a possibility of making an impact on high-throughput systems, but I can't see what they offer high-performance users. -
Re:This is really too bad...but do they really offer anything new or different?
Yes! They are the only people offering realy huge machines with a flat memory space. Where I work, we have problems which need several terrabytes of RAM (no, we can't use swap space) and dozens of processors. For most people that means a large cluster. Sadly, there are a small number of algorithms, like mine, which can't be efficiently manipulated onto a cluster because even with something like myrinet the communications latency is too great. For problems of that sort, SGI are pretty much the only game in town.
Finally, please stop associating SGI with 3D graphics. These days that it only a small part of their business.
-
Re:Neat
Myranet or Infinband
Just some minor corrections and informaton for those interested.
Myricom is the company, Myrinet is the protocol. Infiniband is an open protocol. Myrinet has a maximum speed of 2.2Gb/sec while Infiniband can scale up to 30Gb/sec on a 16x PCI-E card and a 12x port on the switch.
As for what BlueGene/L uses, I don't think I'm at liberty to discuss that.
-
Re:Mac Mini Cluster??
If you are interested, hop on over to amazon.com, check out a few books, fascinating topic. I took a class in it, and I have just seen the tip of the iceberg.
Yes, fascinating. You "took a class in it", while many here have been working in networking for many years and some even in parallel computing.
I couldn't care less what the wiki says or what you think it says. Ethernet comes in lots of grades, but mostly, ethernet is considered to have various bandwidths well below 100Mbit/s (most think it is 10Mbit/s, no more, no less. Most are wrong.), fast ethernet is 100Mbit/s. Yes, what do you know, the wiki is not that specific (ethernet on thickwire, tends to be 1Mbit/s and yes I have used it).
The Mac Mini, does fast ethernet.
Regarding latency in clusters, the system designer needs to take into consideration what will suffice given financial constraints. For some, 100Mbit ethernet will be fine, but then for others, specialized low latency multi-point interfaces are required (like Myrinet).
It seems you really have just seen the tip of the iceberg. Get your scuba gear on and come back when you are more clued up and less of a smart arse.
-
Re:Partnerships
You've said it. Really, what Creative needs to do is to add Myrinet support to its players.
Now, that would be bandwidth, with the side advantage that you could even stop imagining a beowulf cluster of those.
-
Re:Comparison with Myrinet
3. Myrinet (Roughly the same price as IB, but closed standard) sub 10 microsec
Myrinet is not a closed standard. It's an ANSI-VITA standard (26-1998). The specs are available for free (http://www.myri.com/open-specs/) and anybody can build and sell Myrinet switches, if they have the technology.
Furthermore, the latency is sub 4 microsec. Come to SuperComputing next month and you will see. -
Comparison with Myrinet
I've always understood that Myrinet is one of the better latency products available.
And it has MacOSX Drivers:
http://www.myri.com/scs/macosx-gm2.html
Myrinet is used by 39% of the Top500 list published in November 2003
http://www.force10networks.com/applications/roe.as p?content=9 -
Re:The issues are progress and long-term usefulnes
Yes you are right,
We have on our linux cluster right now 36 Serial jobs (one machine)
and 75 parallel (more than one) in the queue. This is on ethernet.
But what about infiniband and myirnet?
http://www.myri.com/
Both of these right now plug into 64bit pci and keep the cpus full up to 80% vs gigabit ethernet doing only like 50%. So the statement that PCI is what is slowing down clusters is false as far as I have seen in my work here. -
Myrinet?
You mean Myrinet?
-
Re:IBM's Blue Gene
-
Jonathan Postel not listed?
I find it hard to believe Jonathan didn't make the list.
-
Re:Bad Conclusions
64 way systems are the definition of vendor lock-in. You're only going to be running qualified hardware on something like that, and paying real $$ for support. And Linux? That is more of a marketing ploy than anything. If you aren't running HP/UX on that machine, you need your head examined. HP/UX is going to have pretty much all of the commercial applications for hardware like that and much better support.
As far as Penguin's offerings for clustering, maybe they are meaningfully better than Apple's offerings, although I'm not sure how.
"Funky" hardware eh? I'm curious as to what generally useful stuff you can't get on the Mac. You mean stuff like GPIB controllers, or maybe Myrinet, or maybe even more obscure? Frankly, if its funky, that pretty well implies specialized. I think it would be foolish to let the availability of some obscure, special purpose widget for limited use influence my choice for a common server or desktop platform.
Current Mac systems combine the power of Unix with the only platform even hinting at rivaling Windows for standard productivity applications, including MS Office.
-
Re:yes
-
What's going on here?Ok, let's look at this objectively. Exacly how many clusters has Penguin deployed over the last 2 years, at least well known ones? How many of these are in the Current Top500 List? Has Penguin had a presence at Supercomputing in the past 2 years , other than having Sam walking around? Are these guys one of the 4 Myrinet authorized vendors in the US? None, no, and No. I really don't see how Penguin can think they're going to compete in this marketspace when there are so many other kick-ass Linux companies out there who specialize in Beowulf clusters, such as Atipa, LinuxNetworks, Microway, Aspen, etc, etc. all with very large install bases. Penguin may be able to cut into the desktop and/or server market but I don't see them cutting it in the Beowulf arena.
It's very difficult to make money on software in the Beowulf arena because, duh, it's FREE! You have to make your money on hardware and integration of the hardware and software. Seems that there'll be lots of overhead with all of the developers now on hand at Penguin. Maybe this is why the CEO of Penguin, Marty Sayer, left 2 years ago and is now a VP at AMD.
In addition, for the most part Clustermatic does the same thing and is setup the exact same as a Scyld distribution, granted Scyld does add some neat things of their own. Although Scyld actually has turned a profit here of late, don't get me wrong, I like Becker but I really don't see this one working out in the long run. -
Re:Microsoft recommending Linux Beowolf cluster?
Try using a Beowulf-style cluster for a CFD problem, and watch as all computation grinds to a halt as your processors and interconnects devote all their capacity to inter-node coherency and synchronization.
Not true.. I admin a 52 proc Athlon 2100+ Beowulf style cluster that runs primarly 3-D CFD code. The code uses Fortran and MPI, and does FFTs across two of the dimensions, requiring huge amounts of communication. We use a Myrinet backbone which gets some impressive stats. Since the MPI is the only thing that uses the Myrinet (NFS uses standard ethernet), its difficult to saturate the switch before maxing out the capabilities of the machines.
Sure, you can probably get an SGI machine that processor for processor will out perform this machine, but in research, cost is the bottom line. The entire cluster only costed ~$85,000. That's only about $1600/processor. That's with 2.5 TB of total useful storage and over 40GB of memory. On a $/MFlop basis, SGI can't come anywhere near that. -
Re:Microsoft recommending Linux Beowolf cluster?
Try using a Beowulf-style cluster for a CFD problem, and watch as all computation grinds to a halt as your processors and interconnects devote all their capacity to inter-node coherency and synchronization.
Not true.. I admin a 52 proc Athlon 2100+ Beowulf style cluster that runs primarly 3-D CFD code. The code uses Fortran and MPI, and does FFTs across two of the dimensions, requiring huge amounts of communication. We use a Myrinet backbone which gets some impressive stats. Since the MPI is the only thing that uses the Myrinet (NFS uses standard ethernet), its difficult to saturate the switch before maxing out the capabilities of the machines.
Sure, you can probably get an SGI machine that processor for processor will out perform this machine, but in research, cost is the bottom line. The entire cluster only costed ~$85,000. That's only about $1600/processor. That's with 2.5 TB of total useful storage and over 40GB of memory. On a $/MFlop basis, SGI can't come anywhere near that. -
Check out...
Myrinet Software. Not only does it support Windows plus a whole range of *NIXes.
-
Re:Connections through PCI bus?There are chips designed to connect two PCI busses together, called PCI-PCI Bridges. For instance, I have an Intel dual port ethernet card with one:
Bus 0, device 12, function 0: PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 3). Master Capable. Latency=64. Min Gnt=4.
But you can't use this to connect a rack of computers. For one thing the max cable length for connecting two busses would be just a few inches. For putting PCI cards in 1.75" high 1U rackmount cases, there are PCI risers with a short ribbon cable that connects to the PCI slot. Even these short cables often cause timing problems. For instance, with the riser, cards may only work in the first one or two slots that will otherwise work in all the slots.
But even if you could cable all the computers together on one giant PCI bus, it would still be a bad idea. A good 24 port gigabit ethernet switch (~$2000) has a 480MB/sec switching fabric, to support full speed full duplex on each port. 32 bit 33Mhz PCI is only about 132 MB/sec, not nearly as fast. You'd need a 64 bit 66 Mhz PCI bus to keep up. And there are more expensive gbit switches with more ports that have 100 Gbit/sec fabric. And this is just gbit ethernet, the slowest and cheapest of the high speed interconnects used in modern Beowulf clusters.
There are faster ways to connect computers than gigabit ethernet. The EE times article is very untechnical, but this one has some more information. LLNL has used a very fast and very expensive interface called quadrics. This is probably the fastest way to connect computers in a Beowulf. People like Cray/SGI and IBM have faster things still, but they cost real big bucks. Other ways to connect a Beowulf are the above mentioned gigabit ethernet (~$100-$250 a node for up to 24 nodes), myrinet (~$1400-$2000
/node up to 128 nodes), and SCIhardware and software (~$1400-$2100 /node). Myrinet uses a switch like gigabet ethernet and the largest switch they have is 128 ports. SCI is switchless, each card has multiple cables (1-3), and is connected in into a ring, 2D or 3D torus. -
Re:Here's what you're missing
With networked clusters you're always going to have latencies, orders of magnitude higher than with single-image supercomputers.
While your point abour ethernet latency is valid, you should be aware that, for somewhat more money, you can get 2gb throughput and about 7us latency. More info at myri.com.
The gap between supercomputer and desktop is getting narrower each year. Eventually you will buy your computer by the pound.
-
Re:What comes next?
Really, my point is that right now, we don't have a standards based high speed interconnect other than Ethernet, and that is quickly running out of steam. I'm vaguely familiar with Myrinet, but it hasn't seemed to have caught on. (I just read that Myrinet is standards based.)
Even then, those standards you mention are all about clustering, not general purpose networking. Ethernet has really caught on in the 30 or so years it has been around because it has been very adaptable- maybe that adaptability is exactly what is preventing it from progressing further. -
Re:Myrinet
I can definitely appreciate the desire to avoid vendor lock. However Myricom has not placed any artificial barriers in the way to keep people from competing with them. Myrinet is an ANSI standard, just as open as Ethernet. It just happens to be much more expensive to produce hardware and software that performs the way Myrinet does than to make Ethernet NICs and drivers.
-
Re:gigE from each segment of the CCD
Another option is to wait for 10gigE (along with the rest of the supercomputing world) or go with Myrinet, which has recently broken the 1 gigabit barrier.
Recently? Myrinet has been doing 2 gigabits full-duplex since May 2001 when it started using fiber. Not to mention that full link utilization only uses a few percent of the host CPU. What's the point of fast cluster interconnect when you use half your CPU sending packets through the TCP/IP stack? -
Re:Forgive my naiveness but
The guy is Danny Cohen, who gave us the terms big-endian and little-endian for computer architectures, and also started Myrinet. Imagine a Beowulf clu*SLAP* Ouch!
-
bandwidth is sometimes overrated
It is well known that ethernet has lousy latency. My favorite analogy is if you calculate the bandwidth of filling a freight train full of 80gb hard drives and steam it cross country. Great bandwidth, lousy latency.
This is why building a cluster using ethernet is not a great idea if the communication to computation ratio is very high. Some clusters I've seen use Myrinet, which offers high bandwidth and low latency at the cost of wire distance (and price). -
Speed v. Latency
Most clustering applications depend on the latency of the network, not the overall speed. A majority of the applications written do not require massive amounts of data to be transferred in order to distribute jobs to the nodes.
What is optimal is a low-latency network protocol that can quickly pass jobs to other nodes. If you're into clustering applications that require the movement of large amounts of data AND want low latency, then products like Myrinet are a good choice. Be prepared to shell out some cash for it, though.
-
Re:#include "Wry smile.h"3. (You need a big room for this) "Unless you downsize it". Oh sure, if you can also "downsize" your users. You have to build something that you can get inside of. There's a limit to downsizing - at the limit is your body!
Really, it doesn't have to take up *that* much space. I've worked in the CAVE before, and while it's somewhat large, there's no reason it couldn't be made smaller these days, given that three-gun CRT projectors are outmoded, and with a single gun projector you could use much smaller mirrors. Make the walls 6' high, and you'd need a room maybe 15' by 15', perhaps 20' by 20' to get the job done. And the ceiling needs to be maybe 8' tall, given some planning. (the CAVE system doesn't have a ceiling surface, but it has a floor - there's a projector pointed straight down via a mirror).
Granted, the 12 CPU Onyx2 (named Cassatt) takes up some room... but that is physically located in a different room in the Beckmann setup, and video is piped in.
So basically you need:- 4 LCD projectors, at a minimum of $1200 or so each for cheesy ones.
- 3 rear projection surfaces - since we're using cheap projectors, you may as well use white bedsheets attached to a homebuilt wooden frame
- 3 or 4 small-ish mirrors, unless you've got room to put the projectors all around the area.
- (the biggie) Lots of CPU, and software.
But I digress. As you can see, it's doable.
-
Re:Impractical
-
Re:Impractical
-
Linux at LLNL
There is a lot of visualization research happening at Lawrence Livermore National Laboratory that's using Linux. A lot of the boxes that we do our day-to-day work on are boxes running RedHat 7.1. We're researching how to best use the latest nVidia drivers with GeForce 3 cards.
I've personally been working on scalable parallel rendering. We have a couple Linux clusters that we're working with. The one that I work on is a 32-node cluster with a Myrinet interconnect. Each box has hardware graphics in it. That cluster is hooked up to several displays so that we can explore very large tiled displays. I'm working on a project called Chromium that's hosted at SourceForge.
So I think you could say that the researchers in the DOE are very interested in what Linux can do. -
SCSI over IP... what about IP over SCSI?
-
Re:Latency
In those cases you *can* tune your code
I'm not sure it is allways possible.
The situation you describe leaves whatever interconnect in place idle for the majority of the time.
The problem is not having interconnect idle for the majority of the time, the problem is having idle nodes because communication needs too much time.
If you want a big cluster (several hundreds or thousand nodes) for high performance computing for programs with important communication, the cluster needs a low latency high bandwidth network like Myrinet, Giganet or TNet for example. Choosing a cheap network will waste a lot of CPU cycles.
-- -
TCP record performance still for FreeBSD AFAIK
AFAIK a team of researcher working on FreeBSD still have the record for TCP performance, using FreeBSD/Alpha on a Myrinet network.
See..
- Duke Computer Scientists Exceed "Gigabit" Data Processing Speeds With Internet Software
- DUKE COMPUTER SCIENTISTS EXCEED 'GIGABIT' DATA FLOW SPEEDS WITH INTERNET SOFTWARE
The performance reached was 1.147 billion bps on a single TCP connection... Way over what Gigabit Ethernet or ATM are even physically able to do. Those boards are really fast...
Anyone know about more recent results ?
-
Re:not Beowulf?You're making a category mistake. MPI and PVM are message-passing libraries. LINDA is a programming language that uses a tuple space stored in distributed shared memory (see here for more info. HACMP is a completly different beast, see IBM's homepage.
Beowulf != any of these. Beowulf is the idea that one can take commodity, off the shelf (COTS) components and build a powerful machine at a price far less then a comparable commercial offering.
Codes run on Beowulf, and really any parallel machine, typically use MPI, PVM, or custom message passing libraries. The beowulf idea includes the use of MPI & PVM, among other freely available software packages. Codes that run on shared memory machines typicall uses the shared memory device of MPI, shared memory, or pthreads.
For CPU intensive tasks the Beowulf idea is great. Codes that perform lots of disk I/O suffer, as adding higher performance (i.e. SCSI) disks increases system cost greatly. Communication intensive tasks perform the worst on beowulf style clusters compared to commercial computers, as the interconnect on beowulf-style clusters can't compare. For a relatively large increase in cost, one can use Myrinet. With Myrinet bandwidth and latency begin to approach that of the switch found on the IBM SP series of machines.
With high bandwidth, low latency interconnect technologies that scale well (e.g. Myrinet), one can build a cluster that outperforms a comparable commercial offering at, say one quarter to one eigth the price. The difference at that point is software. There's really not alot out there to configure and administer beowulf-style clusters, and commercial implementations of some packages beat the pants off of their freely available counterparts (compilers, for example). Until the software situation changes there is still reason to buy your big iron from IBM, SGI, and Sun.
--Jason -
Re:Nifty
... these machines ARE massively parallel supercomputers, if you build them big enough and you use the best commodity networking (like myrinet).
-
You need to define what you need from your clusterFirst: You need to define what you want out of your cluster - what kind of applications it is going to run, what sort of environment you want for them, how large a cluster you want to build, whether you want to do 'free cycle stealing', and whether you want high availability. A 'cluster' is much to vague a term for it to be possible to give much advice based on just that, or even further references.
Second: SCI is orthogonal to the other two technologies - it is a special hardware network technology (Scalable Coherent Interface), originally made to support distributed shared memory. You may be thinking of the software Dolphin Interconnect Solutions provide with their SCI solutions, but as far as I know, that doesn't directly enter into the same space, either. Their web pages does certainly not indicate that it does, and my discussions with (one of?) their Linux developer(s) implied that it contained somewhat more (lock managers etc), but not in the same space. A technology that compete with SCI, though proprietary, is Myrinet. This has a longer history than SCI, and has been less plagued with problems than SCI (though SCI is supposedly quite stable now).
Third: There are a bunch of other technologies (some cross-platform, some single-platform) that compete in making it easy to build clusters. MOSIX and Beowulf are just two of them. If you give more details of what you want to achieve, I'll dig out references from my collection (made to support the development of FreeBSD-specific clustering improvements, so some types of references may be lacking, but I'll probably be able to come with at least some points to start for any wanted cluster workload.)
Eivind.
-
history/future of supercomputing
In my opinion, in order to put this into perspective, you need to look at the history of the subject at hand.
The first types of supercomputers were faster and better than typical computers because of the design and features put into them. They used faster components which were custom-built (and thus a lot more expensive) and had features like vector units which made them attractive to scientific applications (but again, more expensive). Then, people started to think about how they could make supercomputers at the same or faster performance but bring the cost of producing them down. Rather than using expensive custom-built processors that had to be submerged in cooling fluid or using vector units to manipulate large arrays in a single operation, they started to develop new designs for supercomputers. One new type of machine was SMP based systems such as the Cray PowerChallenge type of machine. In this machine, many processors share a common memory, just like in your 2-way or 4-way desktop boxes now. With these types of machines, the lack of vector units isn't such a big deal since you can instead just separate your array into N different portions (where N = the number of processors) and apply your vector operation in parallel over the processors in the system. The problem with these types of computers is that scaling up to large numbers of processors is difficult since contention for the system bus (to talk between the CPU and memory or I/O) gets complicated with the larger number of processors. Another new type of machine were Massively Parallel Processor (MPP) machines such as the Cray T3D and T3E. In these types of machines, many processors (~1024) are interconnected with a very fast network. Each processor has its own individual memory, so the system can be scaled up to much greater numbers of processors. The problem is that now instead of having a single common shared memory, you have all these distributed memories and you have to use message passing techniques to get your data distributed around, which is a pain. So, this led researchers such as John Hennessy (at Stanford) to come up with a new architecture that uses Distributed Shared Memory (DSM). To the applications programmer, things appear to be a large shared memory (although if you touch certain parts of memory, access times are slower than touching other locations in memory -- since they have to be fetched from a remote machine). In fact what actually happens is that each processor still has its own local memory, but a processor on a very fast interconnect card coupled with each processor examines memory references and if it sees you are using memory that is not local to your processor, fetches the desired section of memory from the remote processor. So, it's sort of an MPP type system but appears to the programmer as sort of an SMP type system. This is what SGI/Cray sells as the Origin 2000. It's still cheaper to produce than traditional vector machines which use custom CPU's and memories (since it uses more commodity CPU's and components), but at the same time offers good relative performance.
Now, in the late 80's, Seymour Cray decided that building supercomputers out of commodity components wasn't the right way to go. His opinion was that, all things being equal, you could always make a faster supercomputer if you used more expensive components and designed your supercomputer with that goal in mind (i.e., use SRAM for all memories, use the fastest technology in your CPU, etc.). To that end, he created a company called Cray Computers which was separate from Cray Research (i.e., Seymour was in charge of Cray Computers and had nothing to do with Cray Research). Cray Research produced the computers such as the PowerChallenge and T3E while Cray Computer continued to make expensive vector-type computers. Unfortunately what ended up happening was that Cray Computers folded because their machines were so expensive and the performance gain you got from them did not justify the greater cost. (Really, the only places that bought these types of computers were "spook sites" like the NSA, to the best of my knowledge.)
The pervading idea is that this trend towards computers that offer decent performance while costing significantly less will continue. This is the idea behind clusters such as the Beowulf or, more importantly, clusters like the NT Supercluster at NCSA. The NT Supercluster differs from a Beowulf in that it uses a more costly network adapter (specifically, a Myrinet adapter from Myricom) to allow internode communication to take place at higher bandwidths and lower latencies than a standard Ethernet. No, the performance of these types of machines is nowhere near what you get from a machine like the Origin 2000, but the idea is that you get comparable performance at a huge reduction in cost. Additionally, because the components used to construct these clusters are commodity components, everybody will be producing these components and continuing to improve their performance. So, the speed of cluster-based computing relative to machines like the Origin improves over time. [Disclaimer: I am one of the people who helped develop the technology in the NT Supercluster, so I have some bias.]
To say that SGI ruined Cray is no more true than to say that they ruined MIPS. The reason that people are not that interested in MIPS processors any more is that Intel processors are a commodity now. Everybody uses them, so the overall industry trend is to make Intel and Intel-related technologies faster and better since everybody works together in a sort of de facto way. Yes, probably the MIPS design is a much better processor design than the Intel design (it wouldn't be difficult), but the key thing is that everybody in industry is using Intel. This is the same reason that building supercomputers out of commodity components (i.e., clusters) will probably be the way things work in the future. -
The Roadrunner specsNote that this is not a Beowulf cluster, as they wanted to include some components that aren't exactly cheap (eg, the Myrinet).
Nodes: 64
Check out Roadrunner or MyrinetCPU: Dual Pentium II 450's
Cache: 512 Kb ECC RAM
RAM: 512 Mb SDRAM
Hard Disk: 6.4 GB UDMA EIDE
Network: 100BaseT, Myrinet
OS: Linux 2.2.1, Redhat 5.2
Atticus, a UNM student.