InfiniBand Drivers Released for Xserve G5 Clusters

← Back to Stories (view on slashdot.org)

InfiniBand Drivers Released for Xserve G5 Clusters

Posted by pudge on Friday October 15, 2004 @11:30AM from the insert-grunting-noise-here dept.

A user writes, "A company called Small Tree just announced the release of InfiniBand drivers for the Mac, for more supercomputing speed. People have already been making supercomputer clusters for the Mac, including Virginia Tech's third-fastest supercomputer in the world, but InfiniBand is supposed to make the latency drop. A lot. Voltaire also makes some sort of Apple InfiniBand products, though it's not clear whether they make the drivers or hardware."

12 of 134 comments (clear)

Infiniban into by hardlined · 2004-10-15 11:39 · Score: 5, Informative

http://www.oreillynet.com/pub/a/network/2002/02/04 /windows.html

This is a short into to infiband.

"InfiniBand breaks through the bandwidth and fanout limitations of the PCI bus by migrating from the traditional shared bus architecture into a switched fabric architecture."

"Each connection between nodes, switches, and routers is a point-to-point, serial connection. This basic difference brings about a number of benefits:

Because it is a serial connection, it only requires four as opposed to the wide parallel connection of the PCI bus.

The point-to-point nature of the connection provides the full capacity of the connection to the two endpoints because the link is dedicated to the two endpoints. This eliminates the contention for the bus as well as the resulting delays that emerge under heavy loading conditions in the shared bus architecture.

The InfiniBand channel is designed for connections between hosts and I/O devices within a Data Center. Due to the well defined, relatively short length of the connections, much higher bandwidth can be achieved than in cases where much longer lengths may be needed."

"The InfiniBand specification defines the raw bandwidth of the base 1x connection at 2.5Gb per second. It then specifies two additional bandwidths, referred to as 4x and 12x, as multipliers of the base link rate. At the time that I am writing this, there are already 1x and 4x adapters available in the market. So, the InfiniBand will be able to achieve must higher data transfer rates than is physically possible with the shared bus architecture without the fan-out limitations of the later."
Comparison with Myrinet by Sosarian · 2004-10-15 11:45 · Score: 5, Interesting

I've always understood that Myrinet is one of the better latency products available.

And it has MacOSX Drivers:
http://www.myri.com/scs/macosx-gm2.html

Myrinet is used by 39% of the Top500 list published in November 2003
http://www.force10networks.com/applications/roe.as p?content=9
1. Re:Comparison with Myrinet by Anonymous Coward · 2004-10-15 12:11 · Score: 4, Informative
  
  Here's how bandwidth and latency break down for interconnect technologies:
  
  1. Quadrics (EXPENSIVE! and closed standard) sub 4 microsec
  2. InfiniBand (Realtively inexpensive, open standard) 4.5 microsec
  3. Myrinet (Roughly the same price as IB, but closed standard) sub 10 microsec
  4. GigE (cheap) 20+ microsec
  
  All latency numbers are hardware not software latencies. Depending on how good your MPI stack is you can often triple those numbers.
  
  There are so few companies making IB because there is only one chipset manufacturer right now. Mellanox. All the companies making IB products are startups and it will be a while before things get better.
2. Re:Comparison with Myrinet by stef716 · 2004-10-15 14:25 · Score: 5, Informative
  
  Hi,
  
  where did you get these numbers?
  If you really want to compare the latency of actual interconnects you should use the official performance results achieved in real environments using the driver api:
  (values from homepages)
  
  1. SCI (dolphinIcs) : 1.4 us
  2. Quadrics: 1.7 us
  3. Infiniband 4.5 us
  4. Myrinet 6.3 us
  
  MPI latency and bandwidth highly depend on the mpi library. I suggest to compare the mpich results.
  I rated these interconnects. But I'm sorry, I only have a german version.
  
  http://stef.tvk.rwth-aachen.de/research/interconne cts_docu.pdf
Not 3rd fastest, in fact not on list at all. by Anonymous Coward · 2004-10-15 11:45 · Score: 4, Informative

The Virginia Tech cluster isn't on the top 500 list anymore:

from http://www.top500.org/lists/2004/06/trends.php

* The 'SuperMac' at Virginia Tech, which made a very impressive debut 6 month ago is off the list. At least temporarily. VT is replacing hardware and the new hardware was not in place for this TOP500 list.
BigMac already has I.B. by mfago · 2004-10-15 11:46 · Score: 5, Informative

People have already been making supercomputer clusters for the Mac, including Virginia Tech's third-fastest supercomputer in the world, but InfiniBand is supposed to make the latency drop.

Note that V.T.'s cluster already uses InfiniBand, courtesy of Mellanox.

It's mentioned at V.T.'s pages.
Re:Proprietary Crap by tempest69 · 2004-10-15 11:52 · Score: 5, Informative

Infiniband is designed to be low latency to the extreme. Their driver software is going to be really sensitive to latency. If they can make their nic driver .5 usec faster than their competition it's a huge change in total latency. Thats only 2000 clock ticks, possibly 30-50 memory pulls. But for scientific computing it makes a huge difference in Computational Fluid Dynamics. The more cpu's you scale to, the more important the latency. So their driver software is something that they are going to protect. It would be negligent to give it to the competition. Storm
Re:Third fastest what? by ztirffritz · 2004-10-15 12:07 · Score: 4, Informative

The BigMac at VA Tech missed the list this year because they were busy switching over to DP G5 Xserves. Last I heard, they had completed the project and were busy re-benchmarking the beast. I I also heard that it was poised to move to number 2 possibly on the list after it was retested officially. The Army's version of the BigMac will probably take that title away though. That then 2 of the top 3 machines will be G5 based. Too Cool!

--
Why doesn't anything interesting happen when I have mod points?
Re:Proprietary Crap by Durindana · 2004-10-15 12:14 · Score: 5, Insightful

gig-e can do everything infiniband can, WITH tcp, although without the same low latency of infiniband. infiniband just never caught on when it could, it was ahead of its time, but now gigabit ethernet is cheap, and soon ten gigabit ethernet will strip it dry.

[A 747] can do everything [the Joint Strike Fighter] can, although without the same [supersonic speed, air-to-air combat capability] of [the JSF]. [The JSF] just never caught on when it could, it was ahead of its time, but now [747s are] cheap, and soon [the Airbus A300] will strip it dry.

Smart. "Without the low latency of infiniband"? Idiot, what do you think it's for? We're not talking eDonkeying Halo 2 here... ultra-low latency is THE POINT.

Gee, hard decision, although with that price I can see why mac user's would go for it.

Oh wait, you're just a stupid fucking troll. Why don't you go die?
Proprietary, but definitely not crap by Anonymous Coward · 2004-10-15 12:18 · Score: 5, Informative

gig-e can do everything infiniband can, WITH tcp, although without the same low latency of infiniband.

No offense, but you don't know what you're talking about. IB can sustain tranfer rates of 700 MB/s; the best I've ever seen from GigE was almost an order of magnitude lower, not to mention the two orders of magnitude drop in latency with IB. That might not mean much to you, but I guarantee you it's a big deal for folks with big parallel scientific codes.
Oh, and your pricing's wrong too. In the quantities you'd need it for a decent size cluster, IB gear is about the same cost as its direct competitors (Myrinet and Quadrics).
Re:Proprietary Crap by Kalak · 2004-10-15 13:14 · Score: 5, Insightful

OK, the "proprietary crap" discussed here is for:
#1 XServes runing (wait for it....) Mac OS X.
#2 Supercomputers

This is not your linux box you're using for a NAT server, or a Beowolf running SETI, so if you're building a super computer or just like drolling over them and thinking of using and expensive interconnect like InfiniBand, you're not looking to compare it to Beowolf over gigabit, and possibly not likely to care about if the drivers are binary only or not.

This article is in no way related to any LKML posting other than it's the same company. This is about OSX Infiniband drivers. RTFA sometime, and you might realize such things.

Welcome to the Apple section. If you're not interested in discussion of things related to Apple, please uncheck the appropriate box in your preferences, and we will all be happier. If you like to run Linux on Apple Hardware, please examine the OS discussed before trolling.

If you want to troll about Infinbands policies effecting Linux, then wait until the LWN article is public ("Alternatively, this item will become freely available on October 21, 2004"), and submit it to /.'s general section (where I would be more than happy to consider it not trolling), and enjoy a livelier discussion there, with a wider, and more appropriate, audience.

--
I am, and always will be, an idiot. Karma: Coma (mostly effected by .hack)
Re:Proprietary Crap by Shinobi · 2004-10-15 14:17 · Score: 4, Insightful

Gig-E can't do half of what IB can in the segment that IB targets, and 10GigE can barely do half of what IB can do. First of all: GigE/10GigE is only practically useful together with TCP/IP. Congratulations, you just killed latency, there's no way you can come down to the 12-40 microseconds latency that IB achieves with real workloads. Second, the IB protocol handles traffic priorization directly in the low-level protocols, same thing with the self-healing aspects, routing data around failures. Third, and this is the most neglected part among those who haven't worked with it: RDMA. Direct in hardware and low-level protocols. It lets your process announce memory space out to the node that is sending it data, so that node can write directly into the memory for example. Allows you to build a system with fake shared memory, and still retain 12-40 microsecond latencies, unlike slow and fugly hacks run on top of TCP/IP that try to achieve the same thing with latencies of up to 10ms.

One example of fake shared memory that I've seen is a cluster with an unusual design: Two IBM P5 570's with a total of 32 cores and 128GB RAM, linked together via IBM's NUMA interconnect. They also had a total of 36 Infiniband HCA's. The slave nodes are Xserve G5's with 2GB RAM and Infiniband, and a Xilinx FPGA-card that has its own memory banks. What the slave nodes do is essentially that they work straight against the RAM on the two 570's, with the local RAM only as a form of cache. The project runs as a multi-threaded app on the 570's, and are slaved out to the nodes. The project was originally meant to be used with some p690's.