Finding the Bottleneck in a Gigabit Ethernet LAN?
guroove asks: "I have a small gigabit ethernet network at home, and I spent a lot of money getting gigabit NICs for all my computers and even bought cat 6 cabling. I only have 3 computers on the gigabit network (a Mac, a Windoze machine, and a Linux box) so instead of getting a switch, I triple NIC'd the Linux box, which I use as a gateway and a file server. After the network was complete, I wasn't satisfied with transfer rates, so I started a transfer of a very large file and found that the transfer rate was topping off at just over 145 Mbps (which is a far cry from 1000 Mbps). I'm wondering now where my bottleneck is. Is it the NICs? Are all gigabit NICs really giving us 1000 megabits per second? Is it the driver? Is it Samba? Could it be that the hard drives aren't fast enough? Does anyone have experience with gigabit home networking enough to know where the bottlenecks are? Does the current PCI technology even allow for bandwidth that high"
It is "entirely possible" that the Linux machine is acting as a router, switching all your traffic in C code. Not to mention it is probably sending traffic up and down the PCI bus, once at ingress and once at egress. The lookup of the IP destination address is probably using a whole lot of memory bandwidth, and if it's at all like a regular router, it's probably doing a full IP header Sanity check (using the IP CRC), version number and TTL decrement. After the TTL decrement, you would need to recompute the CRC. I would say the Linux machine is your bottleneck. Unless you could somehow get it configured as an ethernet switch, rather than a Layer 3 router.
Your speed is also dependent on protocol, driver, and OS overhead. Check those things before you worry about such a simple hardware setup.
You didn't give any information about your protocol so that leads me to believe that you haven't considered TCP vs. UDP, for example.
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
a gigabit 128 megabytes
If you are getting 145 megabytes / second, that's damn good.
Um... how about the obvious. How fast is the Hard Drives in both computers? 145Mbps = ~18MB/s which is approaching the sustained limit for many ATA100/133 drives these day.
So I would start there.
On your best day a random IDE drive is going to read or write 30 megs a second (average, on the fairly high side for anything short of SATA or nice SCSI) for completely sequential data in a large contiguous file; that's 240 megabits maximum throughput at the drive heads, or effectively 'wire speed'. That's assuming you are using relatively new hard drives in all these machines.
Throw in all the Samba and other protocol overhead, throw in the fact that you probably aren't running P4 3.2GHz boxes, in fact maybe much less, throw in the lack of a dedicated switch and all of a sudden getting 50% of your theoretical peak throughput (hard drive being the limiting factor, not network) isn't too harsh of a reality.
And it's a 'Windows' box, you stupid fuck. Maybe if Linux users (yea, I'm posting this in Mozilla on a RH9 install) would grow up and learn to spell the word 'Windows', Corporate America wouldn't instantly dismiss Linux users as a bunch of fucking retards. I spend a part of my work day trying to convince my boss that Linux is the choice of a new generation of professionals and every time someone says M$ or 'Windoze' I have to start over from ground zero. If you aren't part of the solution, you are part of the problem.
Glonoinha the MebiByte Slayer
As a lot of people have pointed out, off-the-shelf PCs aren't a good choice for gigabit Ethernet switching and routing regardless of the OS, and you can't really take advantage of true full-duplex Gigabit Ethernet on a standard consumer PCI bus. Still, you can do better than 145 Mbit/sec.
I've been using a LinkSys EG008W switch on my home network, and it's a real bargain. It is a true switch (not a hub), costs less than $200, and all eight ports are capable of autosensing gigabit-capable hardware. Not all so-called "Gigabit" hubs are created equal; some of them only work in half-duplex mode, some of them only have gigabit capability on their uplink ports; some of them slow down to 100 megabit/sec if any of their ports are connected to 100-megabit devices.
The Linksys's big drawback is its fan noise. It is insanely annoying. I owned mine for about 24 hours before I opened it up and dropped the voltage to the fan with a three-terminal regulator IC. I cut a hole in the top to improve the airflow at the lower fan speed, and it's perfectly unobtrusive now. (No, I don't remember what voltage I ended up running the fan at, unfortunately.) If you're either (a) deaf; (b) located at least a couple of rooms away from your network closet; or (c) handy with a soldering iron and indifferent to manufacturer warranties, the EG008W would be an ideal piece of hardware for your situation.
Dahlmann tightly grips the knife, which he may have no idea how to use, and steps out into the plain.
One of the best things you could do is configure SNMP on all 3 boxen. After that, run MRTG to figure out what's happening on the wire. If you made the connectors yourself (as opposed to factory-made cables), doublecheck to see if the connectors fall within the CAT-VI spec. How much of the pair is untwisted? How far into the connector is the shield/plenum seated? Is the wire kinked or does it have sharp bends anywhere? Is the wire running next to power? All these things can cause the signal to be degraded.
Get a good disk benchmark and run that on all 3 boxen. Find out if the disks can sustain traffic at the 1000mb (125MB/s isn't going to come from a single IDE disk) rate. Also, keep in mind that core logic switching from a PCI RAID card to a PCI NIC will eat up some bandwidth.
Finally, benchmark each link individually with a server benchmark tool. Put a 1GB file on the linux box and see how long it takes to transfer to each of the clients. Then do the same file from the clients back to the server.
On a side note, SOHO GB switches shouldn't cost more than $100. But, if your disks cannot keep up with the rate, it won't matter.
On a side, side note, we have tools that show a lot of things about our hardware. Why are there no tools showing the used bandwidth of the PCI/AGP/memory bus. Troubleshooting this prob would be simple if you could see that a specific bus is being saturated.
I'd rather you do it wrong, than for me to have to do it at all.
The key here is "long range". This is one thing that network amateurs always get confused. There is a hell of a difference between latency and bandwidth. TCP does not handle low latency networks all that well (it sucks over satellite, for example) TCP does handle high bandwidth very well, but if the latency is low (because of the "long range" and "millisecond delays" in your example) TCP is not optimal at all.
Basically with a low-latency network there is a lot of space in the pip for packets and TCP does not fill it up because the window is too small.
This has nothing to do with speed.
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism