Slashdot Mirror


Ethernet The Occasional Outsider

coondoggie writes to mention an article at NetworkWorld about the outsider status of Ethernet in some high-speed data centers. From the article: "The latency of store-and-forward Ethernet technology is imperceptible for most LAN users -- in the low 100-millisec range. But in data centers, where CPUs may be sharing data in memory across different connected machines, the smallest hiccups can fail a process or botch data results. 'When you get into application-layer clustering, milliseconds of latency can have an impact on performance,' Garrison says. This forced many data center network designers to look beyond Ethernet for connectivity options."

31 of 169 comments (clear)

  1. Long Live! by Anonymous Coward · · Score: 5, Funny

    Long Live the Token Ring!

    One Ring to rule them all

    1. Re:Long Live! by EnderWiggnz · · Score: 2, Interesting

      yes, people (mostly the government) do have token ring setups.

      the funnest, is that i've done work for naval ships that required 10base2. You know... CheaperNet!

      --
      ... hi bingo ...
    2. Re:Long Live! by MrSquirrel · · Score: 3, Funny

      I saw students bring in computers with token-ring cards when I worked at a University Helpdesk. They would come in and say "My computers broken, I plugged 'the internet' in but it won't connect" (we would troubleshoot over the phone and they would want us to come up to their room, after much repeating our policies they would cave and bring it down because they wanted to download their pr0n). I was baffled when it would turn out to be a token-ring card... I was like "Where the HELL did they get this?". I'm convinced it's part of the worldwide conspiracy to drive me insane.

      --
      A computer once beat me at chess, but it was no match for me at kick boxing.
    3. Re:Long Live! by myth24601 · · Score: 2

      ARCNET is the tank of networks protocols. I was once working on an arcnet system and I tripped over the cable and yanked it out of the wall. Would you believe the token jumped out of the cable and ran accross the floor and jumped into the wall.

      Nothing stops ARCNET!

      --
      No matter where you go, there you are.
    4. Re:Long Live! by Suzuran · · Score: 2

      I had a teacher once who ran ARCNET over a section of barbed-wire fence, just to prove it would work. It worked for about a week until he got bored of it and took it apart, even working through a rainstorm (that made it drop some packets though)

  2. My idea: a vat of salt water & CAT5 by Anonymous Coward · · Score: 5, Funny

    In our Data Center, we have a great big vat of steaming salt water and we drop one end of the cat5 cables from each server into the vat....those packets that can't figure out where they're going just drop to the bottom and die ...we have to drain this packet-goo out once a month. (but we do recycle it...we press it into CDs and sell them on Ebay)

    (Seriously, haven't people heard cut-through switches which just look at the first part of the header and switch based on that... store-and-forward switches are so "1990s")

    TDz.

  3. 30 GB? Take that NSA and your outdated 622MB! by Marxist+Hacker+42 · · Score: 2, Interesting

    The NSA's network sniffer, recently discovered at an AT&T broadband center, can only sniff up to 622MB. Sounds to me like if you use an InfiniBand switch, that would effectively make the output of the NSA's network sniffers complete gibberish.

    --
    SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
  4. 100ms ethernet latency? by victim · · Score: 5, Informative

    I don't think I need to read anymore, well, I did verify that the number really appears in the article.
    This author does not understand the subject material.

    (I suppose you could deliberatly overload a switch enough to get this number, maybe, but that would be silly, and your switch would need 1.25Mbytes of packet cache.)

    1. Re:100ms ethernet latency? by merreborn · · Score: 5, Informative

      Looks like the author fucked up the definition of millisecond too:

      "By comparison, latency in standard Ethernet gear is measured in milliseconds, or one-millionth of a second, rather than nanoseconds, which are one-billionth of a second"

      http://www.google.com/search?hl=en&q=define%3Amill isecond&btnG=Google+Search
      "One thousandth of a second"

      Seriously. How the fuck does this idiot get published?

  5. Not an Auspicious Start by Anonymous Coward · · Score: 5, Informative
    From the article, three paragraphs in:
    "(By comparison, latency in standard Ethernet gear is measured in milliseconds, or one-millionth of a second, rather than nanoseconds, which are one-billionth of a second)"

    That would be one-thousandth, not millionth (aka micro second). Not a good start...

  6. When you get to many hops by with_him · · Score: 5, Funny

    I just blame it on the ether-bunny.

  7. Software design by nuggz · · Score: 2, Interesting

    The origional post makes some comments that
    sharing memory ... the smallest hiccups can fail a process or botch data results.
    Sounds like bad design, or a known design trade off.
    Quite reasonable, when on a slow link, until I know better assume the data I have is correct, if it isn't throw it out and start over. Not wildly different than branch prediction or other approaches to this type of information.

    'When you get into application-layer clustering, milliseconds of latency can have an impact on performance,'
    Faster is faster, not really a shocking concept.

    1. Re:Software design by Amouth · · Score: 2, Funny

      what it looks like to me is.. ok so they set something up using normal 100/1000 ethernet and then realized something was slow and that if they use gbic 30gb ports things run faster... can someone please sent them a cookie?

      --
      '...if only "Jumping to a Conclusion" was an event in the Olympics.'
  8. Did you mean "microseconds"? by pla · · Score: 3, Interesting

    The latency of store-and-forward Ethernet technology is imperceptible for most LAN users -- in the low 100-millisec range.

    I don't know what sort of switches you use, but my home LAN, with two hops (including one over a wireless bridge) through only slightly-above-lowest-end DLink hardware, I consistantly get under 1ms.



    When you get into application-layer clustering, milliseconds of latency can have an impact on performance

    Again, I get less than 1ms, singular.



    Now, I can appreciate that any latency slows down clustering, but the ranges given just don't make sense. Change that to "microseconds", and it would make more sense. But Ethernet can handle single-digit-ms latencies without breaking a sweat.

    1. Re:Did you mean "microseconds"? by dlapine · · Score: 2, Informative
      Sure, for an 8 port switch, where all the computers have a direct connection. Consider the issues involved for a router with a 128 machines all trying to cross-communicate. Or larger collections of computers that might need to use multiple sets of switches to span the entire system.

      On a Force10 switch, with 2 nodes on the same blade:
      tg-c844:~ # ping tg-c845
      PING tg-c845.ncsa.teragrid.org (141.142.57.161) from 141.142.57.160 : 56(84) bytes of data.
      64 bytes from tg-c845.ncsa.teragrid.org (141.142.57.161): icmp_seq=1 ttl=64 time=0.148 ms
      64 bytes from tg-c845.ncsa.teragrid.org (141.142.57.161): icmp_seq=2 ttl=64 time=0.146 ms
      64 bytes from tg-c845.ncsa.teragrid.org (141.142.57.161): icmp_seq=3 ttl=64 time=0.145 ms
      64 bytes from tg-c845.ncsa.teragrid.org (141.142.57.161): icmp_seq=4 ttl=64 time=0.144 ms

      The same nodes using a myrinet connection:
      tg-c844:~ # ping tg-c845-myri0
      PING tg-c845-myri0.ncsa.teragrid.org (172.22.57.161) from 172.22.57.160 : 56(84) bytes of data.
      64 bytes from tg-c845-myri0.ncsa.teragrid.org (172.22.57.161): icmp_seq=1 ttl=64 time=0.051 ms
      64 bytes from tg-c845-myri0.ncsa.teragrid.org (172.22.57.161): icmp_seq=2 ttl=64 time=0.044 ms
      64 bytes from tg-c845-myri0.ncsa.teragrid.org (172.22.57.161): icmp_seq=3 ttl=64 time=0.044 ms
      64 bytes from tg-c845-myri0.ncsa.teragrid.org (172.22.57.161): icmp_seq=4 ttl=64 time=0.043 ms

      The latency gets below 10 usec with the use of special drivers, this is just using the 2.4 Linux tcp stack. What's even scarier about the Myrinet is that I can have all 900+ machines talking at the same time with no drop in latency- we have that network spec'd for full bisection bandwidth. Try that on 900 nodes on a gige network, let alone a 100baseT.

      As was mentioned here earlier, ethernet is nice for networks that change. Once you have a significant number of machines attached, and the number of switches and routers gets past 1, ethernet loses it's equivalence in latency.

      --
      The Internet has no garbage collection
  9. Milliseconds? by rubmytummy · · Score: 2, Funny
    On my planet, a millisecond is a full thousandth of a second, not just one millionth.

    Oh, well. People tell me I'm just slow.

  10. Didn't RTFA? -Infiniband, FC and Myrinet beat Eth0 by hguorbray · · Score: 4, Interesting

    Actually, even with Gigabit ethernet availability HPTC and other network intensive data center operations have moved to Fibre Channel and things like:

    Infiniband http://en.wikipedia.org/wiki/Infiniband

    and Myrinet http://en.wikipedia.org/wiki/Myrinet

    http://h20311.www2.hp.com/HPC/cache/276360-0-0-0-1 21.html
    HP HPTC site

    -What's the speed of dark?

  11. Store & Forward ONLY for 10 to 100 to 1,000. by khasim · · Score: 3, Informative

    There are only TWO reasons to use Store & Forward.

    #1. You're running different speeds on the same switch (why?).

    #2. You really want to cut down on broadcast storms (just fix the real problem, okay?)

    Other than that, go for the speed! Full duplex!

  12. No kidding by ShakaUVM · · Score: 4, Interesting

    Er, yeah. No kidding.

    When I was writing applications at the San Diego Supercomputer Center, latency between nodes was the single greatest obstacle to getting your CPUs to running at their full capacity. A CPU waiting to get its data is a useless CPU.

    Generally speaking, clusters who want high performance used something like Myrnet instead of ethernet. It's like the difference between consumer, prosumer, and professional products you see in, oh, every industry across the board.

    As a side note, how many parallel apps solve the latency issue is by overlapping their communication and computation phases, instead of having them in discrete phases, this can greatly reduce the time a CPU is idle.

    The KeLP kernel does overlapping automatically for you if you want: http://www-cse.ucsd.edu/groups/hpcl/scg/kelp.html

  13. Re:Low-cost options? by dlapine · · Score: 2, Informative
    Define low cost? Myrinet with less than 10 microsecond latency is normally considered to be the least expensive option. You can check their price lists, but an 8 port solution (with 8 HBA's) will set you back over $8k, not including the fiber.

    For some people, that's cheap. If not, sorry.

    --
    The Internet has no garbage collection
  14. The worst post! by Anonymous Coward · · Score: 3, Informative

    I wonder what's happening to slashdot. That's as bad as technical news can get. Ethernet latency -- 100ms?? Typical Ethernet latencies are around a few hundred microseconds. Even the ping round-trip time from my machine to google.com is about 20ms.

    $ ping google.com
    PING google.com (64.233.167.99) 56(84) bytes of data.
    64 bytes from 64.233.167.99: icmp_seq=1 ttl=241 time=20.1 ms
    64 bytes from 64.233.167.99: icmp_seq=2 ttl=241 time=19.6 ms
    64 bytes from 64.233.167.99: icmp_seq=3 ttl=241 time=19.5 ms

    What a shame that such a post is on the front page of slashdot! Someone please s/milli/micro.

  15. For performance, run the same speed. by khasim · · Score: 4, Interesting
    People run different speeds on the same switch all the time, and for not necessarily poor reasons: If you have a SMB (in this case, that's small or medium business) with maybe one big fileserver, you don't need to run gigabit to everyone...
    What's with the "need to"?

    I'm talking performance. Store & Forward hammers your performance. In my experience, you get better performance when you run the server at 100Mb full duplex (along with all the workstations) and use Cut Through than if you have the server on a Gb port, but run Store & Forward to your 100Mb workstations.
  16. Slashdot summary wrong, actual article is better by m.dillon · · Score: 3, Interesting

    The slashdot summary is wrong. If you read the actual article the author has it mostly correct except for one comment near the end.

    Ethernet latency is about 100uS through a gigE switch, round-trip. A full-sized packet takes about 200uS (micro seconds), round-trip. Single-ended latency is about half of that.

    There are proprietary technologies that have much faster interconnects, such as the infiniband technology described in the article. But the article also mentions the roadblock that a proprietary technology respresents over a widely-vendored standard. The plain fact of the matter is that ethernet is so ridiculously cheap these days it makes more sense to solve the latency issue in software, for example by designing a better cache coherency management model and by designing better clustered applications, then it does with expensive proprietary hardware.

    -Matt

  17. Re:Channel Bonding by kjs3 · · Score: 4, Funny

    So you have an environment with requirements totally unlike the ones described in the article and needing none of the solutions illustrated in the article. Hey...thanks for letting us know. Maybe the other million Slashdot users with environments irrelevant to the post can let us know what they have as well.

  18. Ethernet Problems, IB problems, etc by mrjimorg · · Score: 2, Interesting

    Note: I do have a dog in this fight.
    One thing that isn't mentioned in the article is the amount of CPU power required to send out ethernet packets. The typical rule is 1 GHz of processing power is required to send 1 Gb of data on the wire. So, if you want to send 10 Gbs of data, you'd need 10 GHz of processor - pretty steep price. Some companies have managed to get this down to 1 GHz/3 Gbs of processing, and one startup(NetEffect) is now claiming roughly ~0.1 Ghz for ~8 Gbs on the wire, using iWarp. With this, your system can be processing information rather than creating packets.
    The problem with Infiniband, Myranet, etc is that they require another card in your system (and associated heat problems, size issues, etc), special switches and equipment, and new training for your staff on how to get it up and going. However, IWarp, which is based on TCP/IP can use your standard DHCP, ping, tracert, ipconfig, etc and can allow a single card to be used for networking to the outside world (TCP/IP), clustering in the datacenter(IWarp), and storage (IScsi). 1 card, no special new software widgets, 10 Gb speeds.
    However, you cant go and buy a iWarp card from Fry's today. Although, you cant buy an infiniband or myranet card there either

  19. Tolkien ring by Shabazz+Rabbinowitz · · Score: 4, Funny

    I had recently considered using this Tolkien ring until I found out that deinstallation is very difficult. Something about having to take it to a smelter.

  20. Re:Slashdot summary wrong, actual article is bette by trollogic · · Score: 2, Insightful

    I think you have no clue about what your saying. 1) InfiniBand is an open standard hosted by IBTA which is a consortium of companies. The spec is available for anyone who wants to understand/build InfiniBand hardware. Not IEEE does not make it proprietary. 2) The major roadblock with 10Gbps is physics. You can only reach so far with copper without retiming the signal. And optics are expensive. 10 GbE has the same problem and it won't be cheap any time soon. 3) InfiniBand has already reached a volume where on-board IB chips are available in $70-80 range .. 10 GbE is no where close. And IB DDR will be shipping next month (20 Gbps wire / 16 Gbps data). 4) Beowulfs are popular for a reason .. Cache Coherency is a bitch. 5) A round trip node-to-node latency in IB is 2.7 usecs (best case of course). With all the optimization in the world, you won't be able to get ethernet anywhere near that number. 6) InfiniBand is being WIDELY deployed. Sandia Thunderbird is a 9216 processor IB fabric in production. NCSA has Tungsten2 which is 1024 processor IB fabric. NCSA also has a Microsoft Windows Cluster running CCE over IB with 880 processors. There are several large firms Oil&Gas, BioTech, Banks, Market Data houses which run several large multi-hundred/multi-thousand processor IB clusters. 7) Just as with any technology it will take time for new technologies to be accessible to the masses .. so don't write off anything yet. 8) Do you research before you open your mouth.

  21. Maybe useless info: TOP500 interconnect statistics by MojoStan · · Score: 2, Informative
    Generally speaking, clusters who want high performance used something like Myrnet instead of ethernet. It's like the difference between consumer, prosumer, and professional products you see in, oh, every industry across the board.
    That reminded me of the TOP500's statistics generator, so I just had to look up the current list's (November 2005) statistics for "interconnect family". For those that are curious:

    • Myrinet is the second most-used interconnect in the TOP500 at 14% (70 out of 500) followed by HyperPlex at 6% (31).
    • Gigabit ethernet is by far the most used interconnect at 50% (249).

    In the TOP500, it looks like ethernet is not yet an "outsider." Perhaps in the "top 100."

    --
    TO START
    PRESS ANY KEY

    Where's the 'ANY' key? I see Esk, Kitarl, and Pig-Up...

  22. For those who don't understand... by bill_kress · · Score: 3, Informative

    Most (all?) Ethernet hardware reads in an entire packet, looks at it, then sends it on to a destination. This makes building routers and switching hardware fairly easy but extremely slow.

    If you go to a high-speed network, what you get is a packet being forwarded as it's being read. By the time the first few bits are through the switch, it should be able to figure out the next hop and have the packet moving in that direction. Phone companies have huge problems with the delays in Ethernet. This is why the ATM protocol was invented, it's hard to use, awkward and not too graceful, but it can fly through a switching network like nobody's business.

    Ethernet is also extremely sloppy--Any switch along the way is allowed to throw a packet away and wait for the originator to resend causing a HUGE hiccupp in the communication stream (Most if not all routers do this whenever an address is not in it's forwarding table yet).

    IIRC the faster protocols see a "Routing" packet in the stream and set up forwarding hardware before getting the actual packet/stream, then wait until the end of the packet (or entire stream) to tear the route down again.

    Ethernet, however, due to it's simplicity is bridging the gaps. It's a pretty crappy protocol in general, but we keep throwing better, smarter hardware at it in an effort to brute-force it into the parameters we require. (I work for a company that makes Ethernet over fiber hardware, and have worked for companies based around ATM, SONET and other interesting solutions).

    I guess the point of the article was to remind a world that is coming to believe that ethernet is the end-all be-all of networking that it was always just the simplest hack available and therefore the easiest to deal with.

    Just like SNMP.

    1. Re:For those who don't understand... by servanya · · Score: 2
      Most (all?) Ethernet hardware reads in an entire packet, looks at it, then sends it on to a destination. This makes building routers and switching hardware fairly easy but extremely slow.


      First, Ethernet doesn't forward packets. It forwards frames.
      Most (all?) ethernet switches read just the destination MAC and start forwarding it, just as you've described in the next paragraph. If it can't, because there's no bridge table entry for the destination, it floods the frame.


      Ethernet is also extremely sloppy--Any switch along the way is allowed to throw a packet away and wait for the originator to resend causing a HUGE hiccupp in the communication stream (Most if not all routers do this whenever an address is not in it's forwarding table yet).


      Don't start confusing people with L2/L3 comparisons. Routers will drop a packet that it can't forward, but most routers unknowlegable people deal with will have some sort of a default route, so that never happens. In layer 2 land, however, frames are FLOODED when the destination is unknown.

      IIRC the faster protocols see a "Routing" packet in the stream and set up forwarding hardware before getting the actual packet/stream, then wait until the end of the packet (or entire stream) to tear the route down again.


      I think I might have an idea what you're talking about here, but it's hard to tell.
  23. Re:sharing memory over ethernet? by mrchaotica · · Score: 2, Interesting
    Maybe I should RTFA...
    Either that, or you should take the class that I took this past semester. There's a bunch of links to research papers and lecture slides about distributed shared memory (and other kinds of parallel/shared computing issues), if you care to read them.
    --

    "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz