Developing a New Beowulf Architecture?
"Both gigabit ethernet and Myrinet still have one fundamental weakness, a weakness that goes back to the original days of networking, they are a SERIAL medium. Even if you use the fastest technology possible you are still sending bits one at a time down a single pipe. it's like having a single lane highway between L.A. and San Francisco with each car running at 10,000 mph so that you can cope with the bandwidth, it might work but it's a damn silly solution. I therefore propose a new networking solution for use in cluster systems, parallel networking. This isn't as silly as it sounds because we use this solution at work to link two switches, two 100Mb network connections are concatenated together to form a single 200Mb link, but what I propose goes further.
The new system takes advantage of the seven-layer OSI model and separates the new hardware from the operating system. So far as the system is concerned each node has a single network card but the interface is where I propose the change. Every network card includes one or more shift registers which take the parallel information off the PCI bus and convert it to a serial bit stream so that it can be sent along the network cable and when data is received the hardware operates in reverse converting serial to parallel. The new cards replace these shift registers with thirty two (or maybe sixteen) bit latches and the network connector at the back of the card has (say) forty pins. This would allow the use of thirty two pins for data and eight for handshaking and if the new eighty-core IDE cables are used then crosstalk would not be a problem. It's a similar approach to the Digital Video Out connector on some high-end video cards that allow you to connect a flat screen monitor without going through the D to A convertors. Each node has its own cable connecting into the network switch which (as the connections are now thirty two bits wide) would be a 32 x n switch where 'n' would be the number of nodes in the cluster.
Assuming that the idea can fly we would need to develop the following:
1) The new network cards. This isn't as difficult as it seems as a lot of the work has already been done by every network card vendor. With modern ASICs the task of appearing to the system as a NIC whilst presenting the data to the port thirty two bits at a time could be dealt with by a single chip. All it needs is someone to design the chip. If we use standard forty-pin connectors then users can buy the cables off the shelf. To keep things on track we would need to implement all of the NIC functions including giving it a MAC address so that a TCP/IP stack could be implemented.
2) The network switch. A network switch handling data thirty two bits at a time is not a trivial item but I am sure that it can be done. A number of IC manufacturers have crosspoint switches as part of their catalogue and all that needs to be done is to expand the process further. Given the nature of the task it might be possible to carry out the switching using a hardware only solution which would reduce latency even further.
3) The software. Assuming that the new cards appear on the PCI bus as an ordinary NIC then drivers should not be much of a problem. These would probably have to be developed at the same time as the network card. Drivers should include all the required software so that the NIC can work with the kernel but windows drivers as well would be nice.
One final thought, this solution could also be applied to other fields. Want to build a SAN PC and wire it to a pair of servers running My SQL ? Well, you now have a nice fast communication medium.
So, there you have it. Assuming this idea works then we now have a way to increase the speed of a network by reducing the latency rather than throwing more or faster CPUs at the problem. In the spirit of Open Source I do not propose to patent this idea, I want everyone to take the ideas presented here, play around with them, and if a university student is looking for his (or her) final year project they are welcome to give this a try. Should any of you have comments regarding this idea then post away. I should however point out that I'm a great fan of practical criticism, feel free to say that the idea sucks but if you do say WHY it sucks and HOW it can be improved."
Imagine a Beowulf cluster of these!
sell my girlfriend into slavery
... if she's cute ... and a goer.
I'll give you ten bucks
Reliable, Great Value Hosting: $7.95/mo 2.4G/120G
You might want to look into a custom solution using USB2.0 or Firewire. These can theoretically get you 300+ Megabytes per second (Limit of the PCI bus). It won't be an easy solution to pull off but it is definantly doable.
With 8 nodes, I don't think you even need a switch. Just put a couple intel 4 port 100 mbit cards in there and link each node with each node.
That should give you lots more bandwidth and eliminate an expensive switch, and a few nanoseconds of latency.
Reinard
I don't know is this is a pratical possibility, but IIRC Linux 2.4.X can load balance a single network connection over several physical NICs - could this not be a "quick and dirty" for your problem? This could be a starting point..
Try NetBSD... safe,straightforward,useful.
Both gigabit ethernet and Myrinet still have one fundamental weakness, a weakness that goes back to the original days of networking, they are a SERIAL medium. Even if you use the fastest technology possible you are still sending bits one at a time down a single pipe
As it happens, parrallel interconnection's days are numbered becuase they are fundamentally limited as tranmission speed increases. As the speed goes up you increasingly have problems with things like interactions between data lines and having the data arrive at the same time on each line. So, ironically, less lines means you can go faster and provide more bandwidth.
Reliable, Great Value Hosting: $7.95/mo 2.4G/120G
try channel bonding. add a 2nd NIC to each computer and another switch. This can double your bandwidth simply and rather cheaply.
Any way to utilize the AGP port as your network interface?
The bigger issue here is the length of the cable. To get decent bandwidth, parallel communication has to cut down on cable length (see scsi,ide). With serial communication you can go for hundreds of feet (ethernet). For a small rack, such as the one you describe, this might be possible - but cost prohibitive. Think about the cost of an external ultra-160 cable - $100? For 1m? My prices are dated, probably, but still. Remember too that it's not bandwidth that's the problem in clusters - it's latency. That's why Myrinet is so damn costly. Even current cluster interconnects don't need more than 10Mb/s bandwidth - but they need as little latency as is possible.
Usually one runs highly parallelizable things on clusters like this. Which means that the computation can be split into nodes easily, without having to constantly share much data between nodes. If you're not highly paralleled, then 12.5 megabytes a second (because that's what 100bt is) is going to slow you down less than having a slow front-side bus. (100 mhz? -- the point is, if /that/ is what limits your computation, versus your processor speeds, because you aren't parallelized, then maybe a cluster isn't your best bet.)
Consider:
If your nodes need to share more than 12.5 megs of data second, then you might as well be running 100 megahertz processors.
Of course, I could just be talking out my ass.
I've been experimenting with Gigabit Ethernet lately.
The good news is that it's less expensive than you think. Decent cards are only marginally more expensive than good 100bT cards, and netgear now makes a reasonably prices 8port gigibit switch. It doesn't support jumbo frames but it's quite usuable for small networks.
The bad news is that I'm finding that gigabit ethernet doesn't deliver the performance you might expect using traditional network protocols. NFS in particular sees only modest gains, even when using nfsv3 and increasing the block sizes and tuning the kernel buffers/TCP options. I'm still showing bandwidth bottlenecks on the network when I should be seeing bandwidth bottlenecks on the disk array.
It would appear that something isn't scaling. Given that network benchmarking tools do show gigabit ethernet performing at a reasonable speed, it would appear that most "legacy" protocols are not architected to take advantage of it.
Of course, the same cluster can be bound in different ways depending upon the applications that are being run. It is important to realize what the limitations are for your desired tasks and focus your improvements there. I have seen several clusters where they spent an ungodly amount of money on Mirrinet and a massive amount of time getting it working when they were running easily-parallelizable tasks that were really bound just by the number of CPUs.
It's psychosomatic. You need a lobotomy. I'll get a saw.
This is why a beowulf cluster is not always the answer.
Depending on what you are trying to solve, the problem may need to be split up differently. The algorithm to solve the problem and the system you are using need to match well.
Beowulf is great for high cpu intensive tasks with low network useage. Other forms of clustering are good for problems that use shared memory (but this starts to nail the network). Some tasks split up so that just a simple queuing system is all that is needed to do the work, all that you need to be able to do is have a manager job determine who does what and if it was done.
-Tim
-I just work here... how am I supposed to know?
do you have pictures of your girlfriend online? Please post the URL. If she's hot I might be willing to trade for some myrinet cards.
Write a driver that passes data between machines via the SCSI interface. Put each host controller in the chain on it's own ID, tie the networking part of the kernel into the SCSI part of the kernel, wave your magic wand and - **Presto!** Fast, parallel communications (with a lot of the headaches of the communication protocol taken care of by the SCSI command set -- allows for "concurrent" connections between multiple "devices" easily).
To scale, put multiple controllers in a high bandwith machine, moving data between chains. With 8 machines, there'd be no need because they could all fit on one.
Actually I had the pleasure of meeting fathers
:)
of the original Beowulf (Don Becker and Tom Sterling)
and their story goes that it was originally built
exactly like this: a bunch of 4-port cards
connected in hypercube configuration. By the
way, for this case you can scale it to 2^4=16
nodes with 3 hops worst case latency.
The reason for this was that Don at the time was
writing a linux driver for that particular card
and needed some justification for that
activity...
Problem with this approach is that you do message routing in software running on your computational
nodes, not too efficient compared to dedicated
hardware on a switch. Thus, switches were used
ever since...
Paul B.
Imagine one node of this thing on it's own!
All things in moderation; including moderation
Why don't you just go out and buy a Cray???
According to pricewatch.com the current going price is $30 - $49
I do remember a while ago an article (which I believe was on /.) about a B'wolf like system that had two or three ethernet Base 100 cards per box instead of Gigabit. IIRC the author claimed that this was actually faster that a gigabit network due to various things (overhead with Gigabit, some other stuff). I also recall that one of the interesting aspects of the system was the way he had all the topology of the boxen and switches. Apparently he had done some analysis to determine how everything should be wired for maximum thruput / minimum switch saturation.
With my beowolf underperforming, I was considering this idea:
the nodes are kept REALLY close together... maybe stacked. custom-made PCI cards have their entire 32-bit pins ganged together... However a simple and really fast (66MHz?) 32-bit latch or multiplexer is needed for each pci card. The 'bus' of 32 wires runs across all the nodes... practically this should be no more than 2 nodes. The cpu sends pci data to a specific address.
Now the pci card has one address where it listens from from the outside.. the 32 wires. On the inside it simply blasts out all pci data to other pci cards. a bit like ethernets connected to a hub. Unless we can build a switching mechanism, 2 nodes will be easy to start with. (update: will need a simple buffer to counter the lack of synchronization between the pci clocks of the nodes) So when pci data comes in from the cpu, the card blasts it out to the 32 wires.. and the other card listens if it hears its address, puts data in its buffer which on the next clock interrupts the cpu and goes to the cpu.
Other variations include using just the DMA to put the data straight into a software-allocated region of memory. Please note I'm thinking multiprocessing systems rather than this gentleman's beowulf. With shared memory IPC, we can have a ball.
Using the PCI directly has the limitation of distance... not more than 2 feet I think. But I dont remember if this was for 33MHz or 66. I am also running the risk of not understanding the workings (and other unknown limitations) of PCI.
I hereby patent these Ideas under the BSD license.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
The Klatz-2 implemented this technology... Here's a story on it and a link to their wiring app...
I think you mean the KLAT2 project and their Flat Neighborhood Networks.
Good stuff.
Remember kids, preview your posts.
Oracle has just released some firewire drivers/patches that they claim can be used to create firewire clusters. We use it at my work for networking on w2k machines using this and it's really fast with lower latency than 100Mbit network.
Wouldn't a parallel network have higher latency? As you increase speed of serial connection, it takes less time for data to get to it's destination. If you have a parallel network, no matter how many paths you add, it still takes the same ammount of time for data to get to it's destination.
Am I missing something? I'm not an expert on networking.
Would it be possible to add a few firewire cards to the master node, and connect each node to the master pc?
I only mention this, because IEEE 1394 was created with networking in mind, as it (should) work fundamentally the same as ethernet. It runs at 400mbps (which is really what most copper gigabits are limited to).
In addition, once you pass the 400mbps mark, you also have to factor in the bus speed limitation of the nodes in the cluster. THE 32-BIT PCI BUS CANNOT TRANSFER HUGE AMOUNTS OF DATA. 64-bit pci is availible on newer expensive workstation boards, but as far as costs are concerned, it's almost as impractical as myrinet (for which a NIC will set you back about $1500, and require pci-64 as well)
A final suggestion: Why not put 3-4 100mbps ethernet cards in eaach node? I believe this is possible, and has been done before... Since 100mbps cards cost under $10, this seems like a very attractive solution. For even more speed, it might be possible to put each nic on a different subnet, and use 3-4 switches - but I don't know if this would be possible or even help.
-- If you try to fail and succeed, which have you done? - Uli's moose
I think that this article might be useful to you. It describes a commercially available NIC that does much of the network processing on the NIC.
I think that the price is a bit out of your range for now. My understanding is that they are running about $1000 each. Wait, and the price will drop, tho.
HDGary secures my bank
Picture the following scenario (we'll use 4x parallelism as an exmaple, you could go with whatever number you fancy):
You have an alternate protocol, lets call it PIP (Parallel IP), and you code for it in your network stack. It acts like IP and normal TCP/UDP will run on top of it. On a per-interface basis you decide that your public card talks IP and the 4 cards to the private beowulf network talk PIP.
In some config file, you define that eth1-eth4 are part of a PIP group with a single IP address and four MAC addresses. You probably have to define all the beowulf nodes in a central file that you distribute to all the nodes (so they know the mac/ipaddrs of the remote nodes).
The PIP stack takes each would-be IP frame, and instead of encapsulating it in an ethernet frame bound for the single mac-address of the recipient IP address, it split the packet into four chunks with unique serial number 34534568 and sequence numbers 1-4 in an extremely short PIP header that goes on top (inside the ethernet framing). You lookup the 4 MACs for the destination IP (PIP) address, and you send packet #1 out your MAC#1 destined for their MAC#1, etc...
On the receiving side, the PIP stack looks for all four sequence numbers to reconstruct packet 34534568. When and if it receives all four, they are reconstructed and passed upwards. After some minimal timeout value (a few milliseconds at most?) if only some of the sequence numbers have been received, toss the packet out and let TCP (or the app in case of UDP) deal with the lost packet, after all IP isn't gauranteed delivery.
I'm of course leaving out lots of little implementation problems, but that's what you get in a 5-minute slashdot idea
You might also want to handle ICMP in some custom fashion, like sending a ping down all four mac-pair-paths and telling the ping program you got a valid echo when/if you see all four replies.
The downside of course is that it's not really gauranteed to give any latency decrease due to the problem of lost packets and waiting for all four to sync up and whatnot - although it would almost definitely result in an overall bandwidth increase for a tcp stream.
Anyways, food for thought...
11*43+456^2
...as there's a built-in /.() function. :-)
Informatus Technologicus
http://www.networkcomputing.com/1318/1318ibg12.htm l
HDGary secures my bank
You don't say why you need optimum performance, but I'm guessing that since you are a sysadmin, you had some parts and some curiosity.
Just what is it you do with this hobby, other than annoy your gf?
You just said it: your switch doesn't support raw frames. This is critical to get decent performance out of GigE. The explanation is simple: GigE still uses CSMA/CD, even though I don't know of any GigE hub. Every time a NIC needs to send out a frame, it needs to listen to see if anyone is already transmitting. This is defined as a number of ms and is the same length of time from 10Mbps to GigE, to allow compatiblity. So the longer your frames are, the smaller the relative time wasted waiting. Also the ethernet frame header is the same no matter what the frame size is, so the overhead cost gets less important with larger frames.
To do this, you only need four interfaces per node (actually, you need two transmit and two receive, so could get away with two, but let's stick to four so we don't have to hack network layer code).
Assuming you have 2^n nodes, you number them from 0 to 2^n-1. Node n is configured to transmit to nodes 2n mod 2^n and (2n+1) mod 2^n. By extention, it receives from nodes n / 2 and n / 2 + 2^(n-1).
This lets you get a message from one node to another in at most n hops with 2^n nodes and four (or two) nics per node.
You could've hired me.
The problem in most beowulf-type clusters is not bandwith, it's _latency_. That's what you pay for in Myrinet cards: an on-card cpu which gives ridiculously low latencies. And that's what BigIron has: backplanes which allow not only high throughput, but esp. low latency between cpus.
The area where it's useful to use a cluster is in problems where you can split the calculation in small pieces. Typically you don't need to send tons of data across processors, but you need to do it often and quickly. That's where latency kills you.
So you have an idea which is a solution looking for a problem. Now, since others have already pointed that high-bandwith parallel systems are ultimately a bad idea, it turns out you have a _bad_ solution looking for a problem.
I'm not saying that there aren't ways of improving life in the clustering world. But yours isn't one of them. Sorry.
Sounds like a lot of complexity with very little gain. Instead, I'd put two GIGABIT cards in each computer and wire them in a circle.
But the hardware issues are trivial. What we really need is a free microkernel OS that implements shared memory and message-passing across nodes.
One of the things with beowulf clusters is that it's not necessarily the bandwidth that you need to worry about, but rather the latency of the network. For instance, if I ping a machine on my 100mbps network, I get a round trip time of .350 ms. Myrinet, on the other hand, has latencies in the single digit microseconds. Those interfaces are there less for the amount of data that they can push, and more for their response time, giving the cluster a more "whole" feel.
If your goal is to increase bandwith, I would use 4-port 100mb cards and make a hypercube out of the thing. You won't be limited by the backplane of your switch (which is what? if it's a $100 switch, don't expect to be pushing 3 machines at full keel down it), since they will have dedicated 100mbps connections to each node. relatively cheap, and many examples out there.
You might also take a look at SCALI, although it comes into the range of myrinet, but I think it does exactly what you want.
Gigabit switches with all ports being gigabit are about $450, and is quite speedy.
One potential is for you to look at VIA over gigabit, or to use PM (GM is for myrinet, this is from the real world computing somethingorther in Japan). Essentially, the isue with doing regular tcp/ip communications is that tcp/ip is slow, but pretty fault tolerant. these other protocols are not nearly so fault tolerant, but since they are on dedicated networks, they don't need to be.
-- Who is the bigger fool? The fool or the fool who follows him? --
HIPPI is a high speed parallel interconnect.
You'd need to design a switch for it yourself however as it's a point to point system.
It currently tops out at 1600Mb/s which is greater than what 33 MHz PCI can do anyway.
I'm pretty sure the 1394b (Firewire) standard was designed to go up to 1600Mb/s as well.
You notice those squigly lines on your motherboard? They're not just to look pretty, they actually slow the signal down so that the 8, 16, 32, 64, etc parallel bits all arrive at their destination at the exact moment before the clock signal latches in the data.
:-)
Parallel solutions on flexible cable will not be viable anywhere near the average front side bus speed, nevermind faster interfaces. Sorry.
A not too recent slashdot article pointed out a unique way to manage large data transfers among clusters though:
As a simple example, each computer has N 100Mb interfaces. There are a total of J computers. you use K hubs (or, ideally, switches), and connect them together such that one computer has direct hub access to a portion (or all) of the cluster. You can hook the hubs together using a managed switch or router if you want to enable them to contact all the computers in under two hops, without routing through other computers. This is very flexible - you could have it set up so each computer has more than one line going to other computers. Using managed switches you could even reconfigure it on the fly, though your latency goes up.
The idea here is that each computer has a possible bandwidth of multiple times the network card speed. Furthermore, collisions are down, and latency is down. Throughput can reach higher than the practical 80% of ethernet bandwidth.
The real problem with ethernet is latency and collisions. The reason it's used is because it's dirt cheap. Dirt Cheap. Commit that to memory. It's often better to buy a lot of slow devices than one expensive really fast device. You can overcome most of the limitations of the slow devices with other tricks.
So, while your idea has some merit, the reality is much different than the ideality.
-Adam
See my post above for some good info.
use multiple firewire ports. Many newer chipsets are comming with 2 firewire built in, NOT on the PCI bus. you wouldn't have the PCI bus limit of ~132MByte/second. 2 build in, plus whatever you wanted to put on the PCI bus.
Also, 66mhz/64bit PCI has 528MByte/second available on the bus, so firewire on 64/66pci would give you ~11 ports at full speed.
Many machines that have 64/66pci have multiple pci buses so you have 528Mb/bus.
I have a system where the slowest CPU runs at 600MHz but the network links run at (assuming a single start and stop bit) ten million bytes a second.
Start and stop bits? In Ethernet?
Huh?
Really, the thing that needs to be done is improve the *protocol* that the communciation rides upon. If you're using gigabit, usually the limiting factor is the processor- you max the processor (doing the overhead for TCP/IP) before you even come close to filling up the gigabit pipe. There are technologies like Infiniband which try to address that, but progress has been very slow.
There is a big push for TCP offload- to take the effort for TCP and put it into an ASIC- check out this paper. TCP offload is absolutely necessary if we want to go beyond 1 gigabit ethernet any time soon, because the rate that we can communicate is beating the rate of increase in processing. IIRC Moore's law is an 18 month doubling time, there is a similar law for communications speed that says it doubles at a 12 month rate.
In addition, the engineering hassles of making a device with so many paralell channels is pretty hard- When you start doing high speed communications, you have to take a *lot* of care to make sure that the signals get there as you want them. This is why Intel appears to be in the process of moving away from PCI (a paralell bus) and moving to PCI-Express/3GIO, a combination paralell/serial bus, with a variable number of serial channels. PCI-Express is designed to carry on 16 wires what PCI-X carries on 80+ wires. When you actually have to make hardware- this makes a *HUGE* amount of difference.
Actually - there are a lot of other posts here that talk about why parallel necessarily isn't better. At high speeds with long cables it can be a pain in the ass to keep all the bits lined-up. Sometimes it is easier to develope hardware that does serial faster than it is to add more wires and do it parallel.
Absolutely wrong,[well... not absolutely :)]
... that's all you need... latency may or may not be important in that situation.
:)
Latency is nowhere near as important as bandwidth. If you can successfully overlap communication with computation in say an MPI application you achieve much better wall clock times for the runs of big jobs.
I have no idea why people are so obsessed with latency in clusters... perhaps the apps are coded with all blocking communication semantics and cannot do the overlap I described above. Basically if you can stream data to the processors and keep them busy
I actually develop a commercial MPI implementation and we have seen real evidence of my claims above. If you are going to SuperComputing 2002 in Baltimore Maryland next week stop by the MPI Software Technology Inc. booth [I won't actually be there... but some of my demos will be]... ask some questions and try to talk to one of our technical guys... they may be better able to explain this since I am not a senior engineer.
[This should be taken not as a representative statement on behalf of the company I work for but as my own humble interpretation of the real issues].
You've got an interesting idea here but, frankly I think you have forgotten a couple of steps along the way.
When ever you increase the performance of one area in a system, another area becomes the bottleneck. Right now, your bottleneck is assumed to be the network.
As you said you could possibly, do bonding on the interfaces to multiply the bandwidth. However, to accomplish this you will need a better switch that is capable of this bonding. Nortel offers a small switch that can do this, the Baystack 450 is the lowest end capable of what they call multi-link trunking. Cisco offers several switches that are also capable of this. Cisco calls it ether-channel and I think the Catalyst 3500 is their entry point switch with this capability. These switches can bond up to 4 links, I believe, giving you up to a 400Mbps trunk per switch. To do this with multiple systems you would really need a larger and significantly more expensive switch.
You dismissed Gigabit saying that you will suffer the same limitations due to serial communication. However, I believe that Gigabit will be the best solution in this case. Your gigabit performance will not be limited by serial communication as you think. Gigabit communication will be limited by the PCI bus in the system. In fact, your system will not be able to drive gigabit cards beyond 600Mbps because of the PCI bus speed. If you are plugging a disk controller into this PCI bus and further sharing the PCI bandwidth you will further limit the performance that you can expect from your gigabit cards.
There are only two ways around this limitation and only by combining them will you be able to truely maximize the gigabit performance. The first method is to have a system with dual peer PCI buses. This allows you to run disk controllers on one bus and network controllers on another, giving you the full bandwidth of the bus to each controller. This still limits you to around 600Mbps though. To go beyond that, you need to have a 64 bit PCI bus, preferrably dual peer. The drawback here is that you will need to acquire a true high-end server to get these features. Desktop systems and low-end servers do not have 64 bit PCI buses and rarely have dual peer PCI buses.
So, basically gigabit ethernet will provide the highest possible performance on your existing systems. If you want to go beyond that you need to replace your systems with high-end servers with multiple 64 bit PCI buses, running multiple gigabit NICs, multi-link trunked to a high-end gigabit switch.
One final note about your parrallel processing idea, it's a good idea. It's also been done, more or less. 3Com NICs have had a somewhat similar feature that they call "Parrallel Tasking" for years now. The Parrallel Tasking cards off load network operations to a processor on the NIC, leaving the main CPU free for other operations. I'm sure there are other NIC manufacturers that do similar things but, 3Com has always hyped up this feature so they are the first to come to mind. But no matter how fast your processor and your NIC card, you still have to cross a PCI bus and at 32 bits wide it is the PCI bus that will be your bottleneck.
For the record, a quick and easy way to increase your network performance is to use quality network cards. Once upon a time I believed that 100Mb cards all moved data at 12.5 mega bytes per second (theoretical max throughput at 100megabits per second) but testing showed that good cards ( Intel, SMC, 3Com, etc...) moved data upwards of TWICE as fast as the cheap cards (Addtron, Eagle, NE2000 clones, etc...) I didn't test latency but envision similar results.
Glonoinha the MebiByte Slayer
Finally, a few of you have suggested that bit skew would be a problem due to cable length at high speeds. I don't think this is so otherwise ATA 100/133 hard drives wouldn't work.
As I suggested in the original post. Is there a Computer Engineering student who would like to take this up as a final year project ?
Peter Gant
It sounds like you want a NIC with 2 100mbit links that acts 1 200mbit link. OK, but how is that really a new architecture over using some sort of software channel bonding. It is built into linux, and was developed (in linux at least) for the original beowulf cluster. With it, one can easily use one or two large switches and a couple of quad fast ethernet cards to get 400 or 800 mbits of bandwidth in and out of nodes.
And the idea is also firmly established in the NAS market. I don't think it is in use in the SAN market, but that is probably because most SAN venders are using either GigE or FC-AL. However, once iSCSI is more commonly available (I believe it is in progress for NetBSD, don't know what the linux status might be), I'd imagine running it over channel bonded links would also be no trouble.
I'm a loser baby, so why don't you kill me.