InfiniBand Drivers Released for Xserve G5 Clusters
A user writes, "A company called Small Tree just announced the release of InfiniBand drivers for the Mac, for more supercomputing speed. People have already been making supercomputer clusters for the Mac, including Virginia Tech's third-fastest supercomputer in the world, but InfiniBand is supposed to make the latency drop. A lot. Voltaire also makes some sort of Apple InfiniBand products, though it's not clear whether they make the drivers or hardware."
The article is still subscriber-only, but Linux Weekly News has a good summary of some discussion on the LKML about InfiniBand. Greg K-H's original posting can be found here. Basically, he feels that it's impossible to implement the specification for InfiniBand in a free/open source product without violating the licensing agreement of the spec, because of patent infringement.
http://www.oreillynet.com/pub/a/network/2002/02/04 /windows.html
This is a short into to infiband.
"InfiniBand breaks through the bandwidth and fanout limitations of the PCI bus by migrating from the traditional shared bus architecture into a switched fabric architecture."
"Each connection between nodes, switches, and routers is a point-to-point, serial connection. This basic difference brings about a number of benefits:
Because it is a serial connection, it only requires four as opposed to the wide parallel connection of the PCI bus.
The point-to-point nature of the connection provides the full capacity of the connection to the two endpoints because the link is dedicated to the two endpoints. This eliminates the contention for the bus as well as the resulting delays that emerge under heavy loading conditions in the shared bus architecture.
The InfiniBand channel is designed for connections between hosts and I/O devices within a Data Center. Due to the well defined, relatively short length of the connections, much higher bandwidth can be achieved than in cases where much longer lengths may be needed."
"The InfiniBand specification defines the raw bandwidth of the base 1x connection at 2.5Gb per second. It then specifies two additional bandwidths, referred to as 4x and 12x, as multipliers of the base link rate. At the time that I am writing this, there are already 1x and 4x adapters available in the market. So, the InfiniBand will be able to achieve must higher data transfer rates than is physically possible with the shared bus architecture without the fan-out limitations of the later."
This is cool. The Xserve is a great server. We got one at work and we used it as a mirror for a while before switchover. This thing never crashes. according to one of the articles these drivers will optimize the power of these beasts...
The Virginia Tech cluster isn't on the top 500 list anymore:
from http://www.top500.org/lists/2004/06/trends.php
* The 'SuperMac' at Virginia Tech, which made a very impressive debut 6 month ago is off the list. At least temporarily. VT is replacing hardware and the new hardware was not in place for this TOP500 list.
People have already been making supercomputer clusters for the Mac, including Virginia Tech's third-fastest supercomputer in the world, but InfiniBand is supposed to make the latency drop.
Note that V.T.'s cluster already uses InfiniBand, courtesy of Mellanox.
It's mentioned at V.T.'s pages.
The BigMac at VA Tech missed the list this year because they were busy switching over to DP G5 Xserves. Last I heard, they had completed the project and were busy re-benchmarking the beast. I I also heard that it was poised to move to number 2 possibly on the list after it was retested officially. The Army's version of the BigMac will probably take that title away though. That then 2 of the top 3 machines will be G5 based. Too Cool!
Why doesn't anything interesting happen when I have mod points?
Here's how bandwidth and latency break down for interconnect technologies:
1. Quadrics (EXPENSIVE! and closed standard) sub 4 microsec
2. InfiniBand (Realtively inexpensive, open standard) 4.5 microsec
3. Myrinet (Roughly the same price as IB, but closed standard) sub 10 microsec
4. GigE (cheap) 20+ microsec
All latency numbers are hardware not software latencies. Depending on how good your MPI stack is you can often triple those numbers.
There are so few companies making IB because there is only one chipset manufacturer right now. Mellanox. All the companies making IB products are startups and it will be a while before things get better.
No offense, but you don't know what you're talking about. IB can sustain tranfer rates of 700 MB/s; the best I've ever seen from GigE was almost an order of magnitude lower, not to mention the two orders of magnitude drop in latency with IB. That might not mean much to you, but I guarantee you it's a big deal for folks with big parallel scientific codes.
Oh, and your pricing's wrong too. In the quantities you'd need it for a decent size cluster, IB gear is about the same cost as its direct competitors (Myrinet and Quadrics).
Doubtful. IBM's BlueGene is the king right now(well for the time being), but I don't see Big Mac(either version) beating the earth sim. Still, 2 out of the top 4 isn't bad.
Monstar L
3. Myrinet (Roughly the same price as IB, but closed standard) sub 10 microsec
Myrinet is not a closed standard. It's an ANSI-VITA standard (26-1998). The specs are available for free (http://www.myri.com/open-specs/) and anybody can build and sell Myrinet switches, if they have the technology.
Furthermore, the latency is sub 4 microsec. Come to SuperComputing next month and you will see.
Small Tree also makes cool multiport gigabit ethernet cards that support 802.1ad bonding. Really, the gigE cards are the more interesting thing for most of us who don't have a supercomputing cluster to run. The two-port version is less than $300. They work on Linux as well.
http://small-tree.com/mp_cards.htm
Gigabit has a latency of about 100 microseconds and realistic throughput of about 50MB/s. Infiniband has a latency of about 15 microseconds and a throughput of about 500MB/s.
I mostly sell small Apple workgroup clusters of 16 nodes, and these are almost always just a gigE backbone. There are certain classes of problems that can benefit from Infiniband at low node counts, but for the most common apps, like gene searching using BLAST, gigE is just fine.
- "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
Hi,
e cts_docu.pdf
where did you get these numbers?
If you really want to compare the latency of actual interconnects you should use the official performance results achieved in real environments using the driver api:
(values from homepages)
1. SCI (dolphinIcs) : 1.4 us
2. Quadrics: 1.7 us
3. Infiniband 4.5 us
4. Myrinet 6.3 us
MPI latency and bandwidth highly depend on the mpi library. I suggest to compare the mpich results.
I rated these interconnects. But I'm sorry, I only have a german version.
http://stef.tvk.rwth-aachen.de/research/interconn
October 14, 2004 Pg. 54
http://www.netlib.org/benchmark/performance.pdf
http://appleturns.com/scene/?id=4980
"Calm down, Beavis; take a closer look at the third and fourth entries and you'll realize that they're the same exact cluster, before and after its owners added another 64 processors to it. In much the same way, System X is also listed in the seventh, ninth, and eleventh slots, with scores taken at various points along its journey to life as a complete 1,100-Xserve system. Factor out the doubles and, barring an "October Surprise," System X ought to sit in fifth place, under an Alpha cluster, a new Itanium2 system, the once-mighty Earth Simulator, and the new top dog, that chunk of IBM's unfinished BlueGene. Woo-hoo, PowerPCs in two of the top five! No other chip can say that."
~hylas
Actually IB is VERY closely related to PCI Express. At one point they were the same thing and that was called 3GIO by Intel
Normal people worry me!
Yes, Twelve Captures Fifth (10/14/04).
Back in 2002, the tech press couldn't understand the difference between Infiniband and 3GIO (PCI-e), so they incorrectly spun it as they were in competition with each other. There never was any real relationship.
an one program them in Python or Perl, or only "Real Programmers(tm)" languages like Java and C++?
They can be programmed in any language. Fortran and C are by far the most common choices. It's common to see perl and shell scripts used as glue between standalone modular programs. It's about the only place where you'll still occasionally find hand assembly in the inner loops, though that's becoming less common as more compilers support MMX,SSE, etc instructions.
You won't find a lot of interpreted languages doing heavy lifting in HPC. While a typical server is I/O bound (disk or net) and so can spare CPU cycles on an interpreted language, in HPC the CPU is normally pegged.
To say IB network management tools are better is a great understatement. Part of myrinet is that the network topology is forced to be simple and the switches as dumb as possible (distribute the task of routing and mapping the networks to the nodes). IB switches offer a tad more functionality and offload mapping work to the switch, but stays a source-routed network (which is the chief way these technologies acheive low latency while ethernet is switch routed and therefore scales poorly as the switches have more and more work to do.
Of course, until IB over fiber media comes around, myrinet cabling is a hell of a lot easier to deal with, longer lengths, more bendable, and tighter bend radius.
XML is like violence. If it doesn't solve the problem, use more.
IB already exists over fibre. Most folks don't use it because it is much more expensive than copper solutions. Copper is going 10-15 meters these days. Mellanox and Gore just announced 40 meters. http://www.marketwire.com/mw/release_html_b1?relea se_id=73927
The quality of 4x IB cable has gotten much better over the last two years. It will continue to improve as 10 GigE also uses the same style cable.
-- soldack