Developing a New Beowulf Architecture?
"Both gigabit ethernet and Myrinet still have one fundamental weakness, a weakness that goes back to the original days of networking, they are a SERIAL medium. Even if you use the fastest technology possible you are still sending bits one at a time down a single pipe. it's like having a single lane highway between L.A. and San Francisco with each car running at 10,000 mph so that you can cope with the bandwidth, it might work but it's a damn silly solution. I therefore propose a new networking solution for use in cluster systems, parallel networking. This isn't as silly as it sounds because we use this solution at work to link two switches, two 100Mb network connections are concatenated together to form a single 200Mb link, but what I propose goes further.
The new system takes advantage of the seven-layer OSI model and separates the new hardware from the operating system. So far as the system is concerned each node has a single network card but the interface is where I propose the change. Every network card includes one or more shift registers which take the parallel information off the PCI bus and convert it to a serial bit stream so that it can be sent along the network cable and when data is received the hardware operates in reverse converting serial to parallel. The new cards replace these shift registers with thirty two (or maybe sixteen) bit latches and the network connector at the back of the card has (say) forty pins. This would allow the use of thirty two pins for data and eight for handshaking and if the new eighty-core IDE cables are used then crosstalk would not be a problem. It's a similar approach to the Digital Video Out connector on some high-end video cards that allow you to connect a flat screen monitor without going through the D to A convertors. Each node has its own cable connecting into the network switch which (as the connections are now thirty two bits wide) would be a 32 x n switch where 'n' would be the number of nodes in the cluster.
Assuming that the idea can fly we would need to develop the following:
1) The new network cards. This isn't as difficult as it seems as a lot of the work has already been done by every network card vendor. With modern ASICs the task of appearing to the system as a NIC whilst presenting the data to the port thirty two bits at a time could be dealt with by a single chip. All it needs is someone to design the chip. If we use standard forty-pin connectors then users can buy the cables off the shelf. To keep things on track we would need to implement all of the NIC functions including giving it a MAC address so that a TCP/IP stack could be implemented.
2) The network switch. A network switch handling data thirty two bits at a time is not a trivial item but I am sure that it can be done. A number of IC manufacturers have crosspoint switches as part of their catalogue and all that needs to be done is to expand the process further. Given the nature of the task it might be possible to carry out the switching using a hardware only solution which would reduce latency even further.
3) The software. Assuming that the new cards appear on the PCI bus as an ordinary NIC then drivers should not be much of a problem. These would probably have to be developed at the same time as the network card. Drivers should include all the required software so that the NIC can work with the kernel but windows drivers as well would be nice.
One final thought, this solution could also be applied to other fields. Want to build a SAN PC and wire it to a pair of servers running My SQL ? Well, you now have a nice fast communication medium.
So, there you have it. Assuming this idea works then we now have a way to increase the speed of a network by reducing the latency rather than throwing more or faster CPUs at the problem. In the spirit of Open Source I do not propose to patent this idea, I want everyone to take the ideas presented here, play around with them, and if a university student is looking for his (or her) final year project they are welcome to give this a try. Should any of you have comments regarding this idea then post away. I should however point out that I'm a great fan of practical criticism, feel free to say that the idea sucks but if you do say WHY it sucks and HOW it can be improved."
Both gigabit ethernet and Myrinet still have one fundamental weakness, a weakness that goes back to the original days of networking, they are a SERIAL medium. Even if you use the fastest technology possible you are still sending bits one at a time down a single pipe
As it happens, parrallel interconnection's days are numbered becuase they are fundamentally limited as tranmission speed increases. As the speed goes up you increasingly have problems with things like interactions between data lines and having the data arrive at the same time on each line. So, ironically, less lines means you can go faster and provide more bandwidth.
Reliable, Great Value Hosting: $7.95/mo 2.4G/120G
Of course, the same cluster can be bound in different ways depending upon the applications that are being run. It is important to realize what the limitations are for your desired tasks and focus your improvements there. I have seen several clusters where they spent an ungodly amount of money on Mirrinet and a massive amount of time getting it working when they were running easily-parallelizable tasks that were really bound just by the number of CPUs.
It's psychosomatic. You need a lobotomy. I'll get a saw.
Not to mention that neither firewire or usb2.0 approach that speed. They are 400 megabit/s and 480 megabit/s respectively. I really don't know what the original poster was on. Maybe it was deliberate misinformation, I just don't see the point of that though.
USB2: 480 MBit/s -> 60 Megabyte/sec
Firewire: 400 MBit/s -> 50 Megabytes/sec
which is still a lot faster than 100mbit cards
Before you email me, remember: "There is no god!"
Actually I had the pleasure of meeting fathers
:)
of the original Beowulf (Don Becker and Tom Sterling)
and their story goes that it was originally built
exactly like this: a bunch of 4-port cards
connected in hypercube configuration. By the
way, for this case you can scale it to 2^4=16
nodes with 3 hops worst case latency.
The reason for this was that Don at the time was
writing a linux driver for that particular card
and needed some justification for that
activity...
Problem with this approach is that you do message routing in software running on your computational
nodes, not too efficient compared to dedicated
hardware on a switch. Thus, switches were used
ever since...
Paul B.
Actually I remember that IP-over-SCSI driver
was implemented back in pre-1.0 kernel days!
I doubt it would be easier to port it to 2.5.xx
than to write it from scratch though...
Paul B.
Finally, a few of you have suggested that bit skew would be a problem due to cable length at high speeds. I don't think this is so otherwise ATA 100/133 hard drives wouldn't work.
As I suggested in the original post. Is there a Computer Engineering student who would like to take this up as a final year project ?
Peter Gant