Slashdot Mirror


InfiniBand Drivers Released for Xserve G5 Clusters

A user writes, "A company called Small Tree just announced the release of InfiniBand drivers for the Mac, for more supercomputing speed. People have already been making supercomputer clusters for the Mac, including Virginia Tech's third-fastest supercomputer in the world, but InfiniBand is supposed to make the latency drop. A lot. Voltaire also makes some sort of Apple InfiniBand products, though it's not clear whether they make the drivers or hardware."

40 of 134 comments (clear)

  1. Proprietary Crap by ceswiedler · · Score: 3, Informative

    The article is still subscriber-only, but Linux Weekly News has a good summary of some discussion on the LKML about InfiniBand. Greg K-H's original posting can be found here. Basically, he feels that it's impossible to implement the specification for InfiniBand in a free/open source product without violating the licensing agreement of the spec, because of patent infringement.

    1. Re:Proprietary Crap by tempest69 · · Score: 5, Informative

      Infiniband is designed to be low latency to the extreme. Their driver software is going to be really sensitive to latency. If they can make their nic driver .5 usec faster than their competition it's a huge change in total latency. Thats only 2000 clock ticks, possibly 30-50 memory pulls. But for scientific computing it makes a huge difference in Computational Fluid Dynamics. The more cpu's you scale to, the more important the latency. So their driver software is something that they are going to protect. It would be negligent to give it to the competition. Storm

    2. Re:Proprietary Crap by Sosarian · · Score: 2, Insightful

      Of course that misses the point about getting low latencies to improve Beowolf cluster performance by a factor over ethernet.

    3. Re:Proprietary Crap by Johannes · · Score: 2, Interesting

      It's as much crap as other technologies like IEEE 1394 (Firewire). Greg is concerned with the patent licensing requirements for Infiniband, which is a valid concern, but is no different than the requirements for other technologies that have support under Linux.

      In particular, Infiniband requires licensing under RAND terms, similar to that of IEEE 1394.

    4. Re:Proprietary Crap by Durindana · · Score: 5, Insightful

      gig-e can do everything infiniband can, WITH tcp, although without the same low latency of infiniband. infiniband just never caught on when it could, it was ahead of its time, but now gigabit ethernet is cheap, and soon ten gigabit ethernet will strip it dry.


      [A 747] can do everything [the Joint Strike Fighter] can, although without the same [supersonic speed, air-to-air combat capability] of [the JSF]. [The JSF] just never caught on when it could, it was ahead of its time, but now [747s are] cheap, and soon [the Airbus A300] will strip it dry.

      Smart. "Without the low latency of infiniband"? Idiot, what do you think it's for? We're not talking eDonkeying Halo 2 here... ultra-low latency is THE POINT.

      Gee, hard decision, although with that price I can see why mac user's would go for it.


      Oh wait, you're just a stupid fucking troll. Why don't you go die?
    5. Re:Proprietary Crap by gl4ss · · Score: 2, Insightful

      properiaty drivers for properiaty os that is run on properiaty hardware(on os that's only legal to run on that hw makers hardware too).

      so if you're there you're already pretty deep in "properiaty crap".

      --
      world was created 5 seconds before this post as it is.
    6. Re:Proprietary Crap by LoRdTAW · · Score: 3, Informative

      Pay $5k for infiniband hardware or $40 for a gig ethernet card?

      Where did you get this dollar amount and what exactly is it for, the HCA a switch, cable or all of them? HCA's are about 800-1000 dollars. Switches from Mellanox start at about $8,000 for a 480Gbit backplane 24 port switch. And up to $66,000 for a full 96 port modular switch. Cables though I will admit are costly at 100 bucks for a 4X 2m cable.

      Also your statement that it is useless is complete FUD. It certainly will never gain the widespread use of Ethernet, but it serves a niche market for a standard high speed interconnect.

      You obviously have no clue as to what Infiniband is or is capable of. First off 4x Infiniband is 10 times faster then Ethernet at 10 gigabits/sec. And second it has lower latency and CPU utilization then common commodity GbE hardware like the $40 GbE adapter you speak of. GigE and TCP are quite inefficient when compared to Infiniband, even if you bought TOE cards like the 7711 from Adaptec you're still paying as much as you would for an Infiniband HCA and have 10 times less bandwidth. Ok so the switches are expensive but still the throughput is incredible with a 24port switch having 480Gbits of bandwidth at about 8000 bucks! More expensive GigE switches commonly used in clusters are almost as costly as the Infiniband switches.

      I even read that a 1024 node cluster using GbE was just as fast as a 256 node cluster using IB, mainly because each node in the GbE cluster was mostly dealing with Ethernet and TCP/IP then the actual application.

      So before you start talking out of your ass do some research like I did. I might not be 100% correct but I think I am close.

    7. Re:Proprietary Crap by Kalak · · Score: 5, Insightful

      OK, the "proprietary crap" discussed here is for:
      #1 XServes runing (wait for it....) Mac OS X.
      #2 Supercomputers

      This is not your linux box you're using for a NAT server, or a Beowolf running SETI, so if you're building a super computer or just like drolling over them and thinking of using and expensive interconnect like InfiniBand, you're not looking to compare it to Beowolf over gigabit, and possibly not likely to care about if the drivers are binary only or not.

      This article is in no way related to any LKML posting other than it's the same company. This is about OSX Infiniband drivers. RTFA sometime, and you might realize such things.

      Welcome to the Apple section. If you're not interested in discussion of things related to Apple, please uncheck the appropriate box in your preferences, and we will all be happier. If you like to run Linux on Apple Hardware, please examine the OS discussed before trolling.

      If you want to troll about Infinbands policies effecting Linux, then wait until the LWN article is public ("Alternatively, this item will become freely available on October 21, 2004"), and submit it to /.'s general section (where I would be more than happy to consider it not trolling), and enjoy a livelier discussion there, with a wider, and more appropriate, audience.

      --
      I am, and always will be, an idiot. Karma: Coma (mostly effected by .hack)
    8. Re:Proprietary Crap by Anonymous Coward · · Score: 2, Funny
      But they should give it away for FREE!! I want FREE stuff! Gimme FREE stuff, this is slashdot! Information wants to be FREE (as in, fucking GIVEN to me without any effort on my part whatsoever)! Stick it to the man!!

      (Dress this up in a bunch stupid rhetoric, and you have the typical response around here.)

    9. Re:Proprietary Crap by Tiosman · · Score: 3, Interesting

      You obviously have no clue as to what Infiniband is or is capable of. First off 4x Infiniband is 10 times faster then Ethernet at 10 gigabits/sec.

      4X infiniband is 10 Gb/s signal rate but actually 8 Gb/s data rate (8b/10b encoding). This is one of many facts that the IB marketing dept. keep forgetting (I keep telling them, but they won't listen for some reasons).

      GigE and TCP are quite inefficient when compared to Infiniband

      TCP over Infiniband is as inefficient, it has nothing to do with GigE. People use IP over GigE because it's convenient, but you can use GigE without IP if you talk directly to the hardware. Some have tried and are still trying http://www.disi.unige.it/project/gamma/, but the main problem is the lack of hardware documentation from GigE vendors and the short life span of GigE chips.

      I even read that a 1024 node cluster using GbE was just as fast as a 256 node cluster using IB

      It's interesting to note that there are not many 256 nodes clusters in production with IB at the moment, even less with 1024 nodes. Second, just as fast doing what ? A pointless benchmark specially tuned for Infiniband as the IB supporters are used to publish or real-world applications ? Yes, high speed interconnects make a difference but GigE is just fine for a lot of the HPC applications I have seen so far.

      So before you start talking out of your ass do some research like I did.

      Don't believe everything you read, and don't drink the cool-aid that fast. Look at the Top500 just to see what machines are out there, not for the ranking (Linkpack is useless). You will see that there are quite a lot of GigE clusters and not that many IB ones. It's a matter of economics: if IB makes sense, people will buy it. These days, they buy much more GigE (or other) than IB.

    10. Re:Proprietary Crap by Shinobi · · Score: 4, Insightful

      Gig-E can't do half of what IB can in the segment that IB targets, and 10GigE can barely do half of what IB can do. First of all: GigE/10GigE is only practically useful together with TCP/IP. Congratulations, you just killed latency, there's no way you can come down to the 12-40 microseconds latency that IB achieves with real workloads. Second, the IB protocol handles traffic priorization directly in the low-level protocols, same thing with the self-healing aspects, routing data around failures. Third, and this is the most neglected part among those who haven't worked with it: RDMA. Direct in hardware and low-level protocols. It lets your process announce memory space out to the node that is sending it data, so that node can write directly into the memory for example. Allows you to build a system with fake shared memory, and still retain 12-40 microsecond latencies, unlike slow and fugly hacks run on top of TCP/IP that try to achieve the same thing with latencies of up to 10ms.

      One example of fake shared memory that I've seen is a cluster with an unusual design: Two IBM P5 570's with a total of 32 cores and 128GB RAM, linked together via IBM's NUMA interconnect. They also had a total of 36 Infiniband HCA's. The slave nodes are Xserve G5's with 2GB RAM and Infiniband, and a Xilinx FPGA-card that has its own memory banks. What the slave nodes do is essentially that they work straight against the RAM on the two 570's, with the local RAM only as a form of cache. The project runs as a multi-threaded app on the 570's, and are slaved out to the nodes. The project was originally meant to be used with some p690's.

    11. Re:Proprietary Crap by Barto · · Score: 3, Informative

      You're missing the point: if the spec was made open (NOT the driver software), open source drivers could be developed increasing the demand for Infiniband products, reduce costs to users and Infiniband and improve compatibility.

    12. Re:Proprietary Crap by Johannes · · Score: 3, Interesting

      What's wrong with OpenIB?

    13. Re:Proprietary Crap by glenkim · · Score: 2, Funny
      If so, then why bother? Alternative network stacks over gig-ethernet would be much cheaper and can reasonably competitive in terms of latency with well written code.



      There was a dead project that I read about a few months ago that had 20microsecond latency over 100 ethernet. If anybody knows what I'm talking about, I would appreciate a refresher.

  2. Imagine by commodoresloat · · Score: 3, Funny

    installing Infiniband on a single unit G5....

  3. Shocking by CMiYC · · Score: 3, Insightful

    With so few companies left doing anything Infiniband related, makes you wonder what the thinking is here.

  4. Infiniban into by hardlined · · Score: 5, Informative

    http://www.oreillynet.com/pub/a/network/2002/02/04 /windows.html

    This is a short into to infiband.

    "InfiniBand breaks through the bandwidth and fanout limitations of the PCI bus by migrating from the traditional shared bus architecture into a switched fabric architecture."

    "Each connection between nodes, switches, and routers is a point-to-point, serial connection. This basic difference brings about a number of benefits:

    Because it is a serial connection, it only requires four as opposed to the wide parallel connection of the PCI bus.

    The point-to-point nature of the connection provides the full capacity of the connection to the two endpoints because the link is dedicated to the two endpoints. This eliminates the contention for the bus as well as the resulting delays that emerge under heavy loading conditions in the shared bus architecture.

    The InfiniBand channel is designed for connections between hosts and I/O devices within a Data Center. Due to the well defined, relatively short length of the connections, much higher bandwidth can be achieved than in cases where much longer lengths may be needed."

    "The InfiniBand specification defines the raw bandwidth of the base 1x connection at 2.5Gb per second. It then specifies two additional bandwidths, referred to as 4x and 12x, as multipliers of the base link rate. At the time that I am writing this, there are already 1x and 4x adapters available in the market. So, the InfiniBand will be able to achieve must higher data transfer rates than is physically possible with the shared bus architecture without the fan-out limitations of the later."

  5. speeeeed... by jwind · · Score: 2, Informative

    This is cool. The Xserve is a great server. We got one at work and we used it as a mirror for a while before switchover. This thing never crashes. according to one of the articles these drivers will optimize the power of these beasts...

  6. Comparison with Myrinet by Sosarian · · Score: 5, Interesting

    I've always understood that Myrinet is one of the better latency products available.

    And it has MacOSX Drivers:
    http://www.myri.com/scs/macosx-gm2.html

    Myrinet is used by 39% of the Top500 list published in November 2003
    http://www.force10networks.com/applications/roe.as p?content=9

    1. Re:Comparison with Myrinet by Anonymous Coward · · Score: 4, Informative

      Here's how bandwidth and latency break down for interconnect technologies:

      1. Quadrics (EXPENSIVE! and closed standard) sub 4 microsec
      2. InfiniBand (Realtively inexpensive, open standard) 4.5 microsec
      3. Myrinet (Roughly the same price as IB, but closed standard) sub 10 microsec
      4. GigE (cheap) 20+ microsec

      All latency numbers are hardware not software latencies. Depending on how good your MPI stack is you can often triple those numbers.

      There are so few companies making IB because there is only one chipset manufacturer right now. Mellanox. All the companies making IB products are startups and it will be a while before things get better.

    2. Re:Comparison with Myrinet by Tiosman · · Score: 2, Informative

      3. Myrinet (Roughly the same price as IB, but closed standard) sub 10 microsec

      Myrinet is not a closed standard. It's an ANSI-VITA standard (26-1998). The specs are available for free (http://www.myri.com/open-specs/) and anybody can build and sell Myrinet switches, if they have the technology.

      Furthermore, the latency is sub 4 microsec. Come to SuperComputing next month and you will see.

    3. Re:Comparison with Myrinet by stef716 · · Score: 5, Informative

      Hi,

      where did you get these numbers?
      If you really want to compare the latency of actual interconnects you should use the official performance results achieved in real environments using the driver api:
      (values from homepages)

      1. SCI (dolphinIcs) : 1.4 us
      2. Quadrics: 1.7 us
      3. Infiniband 4.5 us
      4. Myrinet 6.3 us

      MPI latency and bandwidth highly depend on the mpi library. I suggest to compare the mpich results.
      I rated these interconnects. But I'm sorry, I only have a german version.

      http://stef.tvk.rwth-aachen.de/research/interconne cts_docu.pdf

    4. Re:Comparison with Myrinet by Junta · · Score: 3, Informative

      To say IB network management tools are better is a great understatement. Part of myrinet is that the network topology is forced to be simple and the switches as dumb as possible (distribute the task of routing and mapping the networks to the nodes). IB switches offer a tad more functionality and offload mapping work to the switch, but stays a source-routed network (which is the chief way these technologies acheive low latency while ethernet is switch routed and therefore scales poorly as the switches have more and more work to do.

      Of course, until IB over fiber media comes around, myrinet cabling is a hell of a lot easier to deal with, longer lengths, more bendable, and tighter bend radius.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    5. Re:Comparison with Myrinet by soldack · · Score: 2, Informative

      IB already exists over fibre. Most folks don't use it because it is much more expensive than copper solutions. Copper is going 10-15 meters these days. Mellanox and Gore just announced 40 meters. http://www.marketwire.com/mw/release_html_b1?relea se_id=73927

      The quality of 4x IB cable has gotten much better over the last two years. It will continue to improve as 10 GigE also uses the same style cable.

      --
      -- soldack
  7. Not 3rd fastest, in fact not on list at all. by Anonymous Coward · · Score: 4, Informative

    The Virginia Tech cluster isn't on the top 500 list anymore:

    from http://www.top500.org/lists/2004/06/trends.php

    * The 'SuperMac' at Virginia Tech, which made a very impressive debut 6 month ago is off the list. At least temporarily. VT is replacing hardware and the new hardware was not in place for this TOP500 list.

    1. Re:Not 3rd fastest, in fact not on list at all. by derdesh · · Score: 3, Informative
      Thursdays As the Apple Turns has an episode speculating that Virginia Tech's cluster should come in at number 5 in the new list.

      (The link should be good until sometime this weekend, then it will be avaiable in re-runs)

  8. BigMac already has I.B. by mfago · · Score: 5, Informative

    People have already been making supercomputer clusters for the Mac, including Virginia Tech's third-fastest supercomputer in the world, but InfiniBand is supposed to make the latency drop.

    Note that V.T.'s cluster already uses InfiniBand, courtesy of Mellanox.

    It's mentioned at V.T.'s pages.

  9. Well thank goodness.... by Killer+Eye · · Score: 3, Funny

    ...Halo and UT2004 were starting to slow down on my 1200 CPU cluster!

    --
    "Microsoft killed my company, I hold a personal grudge. I don't use Microsoft products and neither should you."-JWZ
  10. Re:Third fastest what? by ztirffritz · · Score: 4, Informative

    The BigMac at VA Tech missed the list this year because they were busy switching over to DP G5 Xserves. Last I heard, they had completed the project and were busy re-benchmarking the beast. I I also heard that it was poised to move to number 2 possibly on the list after it was retested officially. The Army's version of the BigMac will probably take that title away though. That then 2 of the top 3 machines will be G5 based. Too Cool!

    --
    Why doesn't anything interesting happen when I have mod points?
  11. Proprietary, but definitely not crap by Anonymous Coward · · Score: 5, Informative
    gig-e can do everything infiniband can, WITH tcp, although without the same low latency of infiniband.

    No offense, but you don't know what you're talking about. IB can sustain tranfer rates of 700 MB/s; the best I've ever seen from GigE was almost an order of magnitude lower, not to mention the two orders of magnitude drop in latency with IB. That might not mean much to you, but I guarantee you it's a big deal for folks with big parallel scientific codes.

    Oh, and your pricing's wrong too. In the quantities you'd need it for a decent size cluster, IB gear is about the same cost as its direct competitors (Myrinet and Quadrics).

    1. Re:Proprietary, but definitely not crap by sjames · · Score: 3, Insightful

      It would be nice to see IB actually come together, but it's an uphill battle. The Spec is a massive tangly mess. Vendor infighting and politics has nearly killed it dead two or three times now. The last thing it needs is for the specs to be priced like they're printed on gold leaf and patent battles to boot.

      Meanwhile, I've seen lightweight reliable non-IP protocols over bog standard GigE hardware get 10 microsecond latency and as a result, 90MB/s ACTUAL transfer.

      Given that, 10GigE could give IB a real run, especially if it's coupled with an onboard DMA engine (there's no reason it can't be). Consider that with the right protocol, GigE can get a little over 90% of theoretical, if 10GigE manages that, it'll beat IB.

      There's a lot of good things about IB, but if the IBTA really wants it to catch on, they'd better start acting like they WANT people to buy it. Right now, IB's best chance looks to be the OpenIB project. However, if the IBTA decides to try locking it up tighter and tighter, OpenIB won't save them, the rest of the industry will do clever things with 10GigE and save itself a bunch of patent headaches.

    2. Re:Proprietary, but definitely not crap by sjames · · Score: 2, Insightful

      SUNET is having problems with 10GigE when they reach around 50-60%

      If they're using IP, I'm not at all surprised. IP is designed to provide reasonable performance in a hetrogenous unreliable network. In a cluster environment, you would want the protocol to provide excellent performance in a homogenous and error-free environment and simple correctness (likely at a terrible penelty) in an unreliable environment.

      The problem is that IP has to deal with fragmentation, out of order delivery and moderatly frequent lost packets. It has to support TTL so that routing loops don't tear the whole net down. It's designed to sorta work even if the underlying fabric is a mess. None of that makes it a bad protocol, just the wrong choice for a reliable low latency HPC fabric that's under a single administrative authority.

      The fact that TCP/IP works as well as it does on a fabric that bis nearly the opposite of what it is designed for speaks volumes. The fact that it is used on so many clusters demonstrates that people are willing to pay a substantial performance penelty in exchange for an open, well understood, and easy to program network. That's a big reason that IP over IB exists at all.

      I see about 50-60% for GigE using TCP/IP as well. I only see >90% with alternate protocols. Most I have talked to report that IP over IB where IB appears to the IP stack as an ethernet like device barely outperforms GigE. That's not surprising either. I suspect the IP stack is the problem there as well.

  12. Re:Third fastest what? by antifoidulus · · Score: 3, Informative

    Doubtful. IBM's BlueGene is the king right now(well for the time being), but I don't see Big Mac(either version) beating the earth sim. Still, 2 out of the top 4 isn't bad.

  13. That article is obsolete by Wesley+Felter · · Score: 3, Insightful

    Back in 2002, people were pitching IB as a replacement for PCI. Today, nobody tries to do that -- IB and PCI are used for different purposes (clustering and I/O expansion, respectively).

    1. Re:That article is obsolete by sirsnork · · Score: 3, Informative

      Actually IB is VERY closely related to PCI Express. At one point they were the same thing and that was called 3GIO by Intel

      --

      Normal people worry me!
  14. the gigE card is more interesting by Twid · · Score: 2, Informative

    Small Tree also makes cool multiport gigabit ethernet cards that support 802.1ad bonding. Really, the gigE cards are the more interesting thing for most of us who don't have a supercomputing cluster to run. The two-port version is less than $300. They work on Linux as well.

    http://small-tree.com/mp_cards.htm

    Gigabit has a latency of about 100 microseconds and realistic throughput of about 50MB/s. Infiniband has a latency of about 15 microseconds and a throughput of about 500MB/s.

    I mostly sell small Apple workgroup clusters of 16 nodes, and these are almost always just a gigE backbone. There are certain classes of problems that can benefit from Infiniband at low node counts, but for the most common apps, like gene searching using BLAST, gigE is just fine.

    --
    - "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
  15. Not 3rd fastest by not_hylas(+) · · Score: 2, Informative

    October 14, 2004 Pg. 54

    http://www.netlib.org/benchmark/performance.pdf

    http://appleturns.com/scene/?id=4980

    "Calm down, Beavis; take a closer look at the third and fourth entries and you'll realize that they're the same exact cluster, before and after its owners added another 64 processors to it. In much the same way, System X is also listed in the seventh, ninth, and eleventh slots, with scores taken at various points along its journey to life as a complete 1,100-Xserve system. Factor out the doubles and, barring an "October Surprise," System X ought to sit in fifth place, under an Alpha cluster, a new Itanium2 system, the once-mighty Earth Simulator, and the new top dog, that chunk of IBM's unfinished BlueGene. Woo-hoo, PowerPCs in two of the top five! No other chip can say that."

    --
    ~hylas
  16. Permanent link. by Xenex · · Score: 3, Informative
  17. Re:I'm curious by sjames · · Score: 2, Informative

    an one program them in Python or Perl, or only "Real Programmers(tm)" languages like Java and C++?

    They can be programmed in any language. Fortran and C are by far the most common choices. It's common to see perl and shell scripts used as glue between standalone modular programs. It's about the only place where you'll still occasionally find hand assembly in the inner loops, though that's becoming less common as more compilers support MMX,SSE, etc instructions.

    You won't find a lot of interpreted languages doing heavy lifting in HPC. While a typical server is I/O bound (disk or net) and so can spare CPU cycles on an interpreted language, in HPC the CPU is normally pegged.

  18. Re:adjust your attitude, please by Kalak · · Score: 2, Insightful

    This is the Apple section of slashdot. These sections are present for a reason. Apple policies wern't being discussed at all, but Infiniband policies. Given that drivers are now released for Infiniband for OSX, the question of what this brings to Apple Clusters is something relevant to be discussed here. The reason I would brand you a troll is that you're speaking negatatively about something that is not relevant to this section of /.: How linux clusters are effected by IP issues decided by Infinband. You have yet to frame this in terms of contrasting how their policies effect Linux and how Apple drivers are now released. If you had started that way, then perhaps I would have just watched the discussion.

    To say I don't want to hear what you have to say has been already been proven wrong, as I have suggested previously that you make such a discussion in the appropriate section of /. at the appropriate time (when the content on LWN becomes publically available). I all but wrote the submission for you! That hardly sounds like I'm trying to keep you from discussing this at all. I've suggested broadening your audience even.

    Also to say I've made up my mind about Infiniband or Linux v. Apple clusters couldn't be further from the truth. I haven't said anything about what I prefer, what is best, or anything to indicate my opinion. In fact, I have no informed opinion on the subject at all, much less have I expressed one. As a matter of fact, what is my direct put down of Linux-based clusters? Any comment I made was that this is that clusters likely to look at Infiband are not small scale, hobbyist clusters, but more likely larger clusters with larger budgets, so if the specs are free or not will be less of a factor.

    --
    I am, and always will be, an idiot. Karma: Coma (mostly effected by .hack)