x86 Commodity-Hardware Router?
neomage86 asks: "I recently had to set up a router for a small company, only five users at any given time, and the needed VPN capabilities are built in. So, instead of using a Cisco or other embedded router, I decided to just install Linux and IPTables on an old 200 MHz PII I had lying around. It's been working fine, and I'm thinking about doing something like this for a much larger network (3000+ users). Does anyone have suggestions on how much I will have to beef up the hardware to provide IP Masquerading for about 1000 users on a T3; provide network-layer filtering of the transmission; and route between 4-5 internal subnets?"
VPN can be a real resource hog... word is though, that the Via C3 has some sort of processor level instructions to help accelerate this. Has anbody else heard of this?
I would personally go with a BSD flavor rather then Linux. Don't get me wrong Linux is great but BSD was designed with routing in mind. You will be able to get away with less hardware and out of box things like OpenBSD are going to be more secure then a commodity Linux.
you simply cannot pump that much through a standard PC. .... Unless anyone knows if those quad cards can route between connectors at faster (much much muuuuuch faster) than the PCI bus will allow
If its 100baseT, 4x12.5MB/s = 50MB/s is easily within the capabilities of a standard 32bit/33MHz PCI bus (100MB/s sustained), at least in terms of transfer rate. Make sure to use a card that has drivers which support polling (aka NAPI on linux).
I use Friend/Foe + mod-point modifiers as a karma/reputation system.
First off, the case itself was one of the 'all in one' deals, simple one-5.25 bay, one-HD bay, one-floppy, half-height PCI cards only, etc.
:-)
The P2 was a typo, and one I appologize for. P3 would be much more accurate, and overlooking the typo is inexcusable as I was simply typing quietly before I hit post, and didn't read the entire post from the beginning before hitting post.
As for the T1's, we didn't use any PCI T1 cards. We used an external 10/100/1000 switch with all 8 T1's plugged into it via normal T110/100 converters as a concentrator, with the uplink port plugged into the computer. Four 10/100 PCI half-height network cards + onboard, three + onboard used. Onboard led to the switch with the T1's on it, the individual network cards all led to individual subnets.
As for the downclocking, yes, we had to throw jumpers. And as I said, it was policy at the time, and one I didn't completely agree with but it did noticably lower the heat output on the CPU's, which was often a problem when we had to install these things under bleachers or in other areas with absolutely zero ventilation and little access. In one case, we had to repurpose a bathroom actually, speaking of those. For that specific reason, the downclocking made sense.
The configuration of the multiple T1's on one ethernet port was fairly simple, using the Aliasing features of Linux to pretend to be 8 seperate ethernet cards plugged into that one switch, leading to each of the 8 T1 cards.
And yes, the CPU had little cache, and slow cache to boot, but lots of memory, and with that configuration it wasn't dealing with much data, barely a fraction of the actual network traffic, because all the network cards we'd installed could copy data directly from their own buffers to other network cards. The fastcopy option under Linux Networking in the kernel IIRC.
If you have any more questions, feel free to post again though.
Okay, point-by-point again.
The 't1 to 10/100 converters' are just common T1 interface boxes that output ethernet instead of 24 voice/data jacks. Data-only T1 interfaces, essentially. Unfortunately, that was one aspect I had zero to do with, the site provided them and I haven't had a reason to use them since (we usually do satellite T1 links for remote sites, or use sDSL for medium-term fixed emplacements), so other than saying Netopia was branded all over the boxes, I can't help further than a Google search would.
And the direct copying can change the addresses, so MASQ can still function as I understand it. To be honest, the direct copying of packets didn't drop the CPU load anywhere NEAR as much as simply having the cards seperated across seperate PCI busses, so the CPU could talk to each of the groups at the same time, instead of having to shout down the same piece of tin-can-and-string to everyone at once.
We did do what you described though, all the firewalling/IPsec/what-have-you was a seperate set of rules between a pair of virtual ethernet devices.
The overall layout was this:
Arbitrary subnet gets VPNed/MASQed/etc to a virtual ethernet address. Virtual ethernet gets firewalled to another virtual ethernet. Second virtual ethernet gets dynamically MASQed with connection-tracking to the 8 T1's to send the traffic to the lowest-usage T1 over the last minute or so using QoS rules.
Most of that's just shuffling headers around, which are tiny, and the final copy boiled down to a single MASQ and either getting passed on or dropped on the floor, which still works with fastcopy.
And yes, tracking a couple thousand concurrent connections did eat up the memory. (2-4 per laptop, LONG story, client was using multiple bidirectional realmedia streams to push an IRC-like live QA session at the Detroid Auto Show one year for vendors, so the presenter could ask questions and get realtime answers back without having to resort to a 'show of hands' count. Yes, we told them it was a bad design.)
As for cooling... At detroit we had plenty of space, plenty of cooling, etc, etc. But to be quite honest we've literally shown up at a site, and been informed they 'repurposed' our space for storage, and found we can barely squeeze a folding chair and a laptop into the space left for us, even with setting things on shipping crates. We gave up complaining and learned to expect (and equip ourselves) to be crammed in the equivilant of a furnace room with zero ventilation and space for one person to stand unseen as our minimal requirements for getting a live press event running for up to 12 hours at a stretch. Live press-style events are a bitch, but we do fairly well at supporting them.
There's a whole niche market for "stripped-down versions of Linux" that handle things like this.
Currently, I'm using Mikrotik RouterOS as a core router. It's at a small ISP -- 400 or so high-speed customers, 3000 dialup customers (400-500 of which are connected during peak times). Standard routing stuff (30 or so internal static routes, big deal). Couple hundred firewall rules (some for stopping Windows worms from spreading, some for general network security, some to help keep the nastier spammers in check). And BGP, taking a full BGP feed from our upstream, plus a couple multihops from places like Cymru's bogons project. And it doubles as a PPTP server so I can securely work from home (in a gesture of supreme irony, I can't get Internet connectivity from the company I work at).
And some other stuff I can't think of right now.
All this is running in a 1U system I got from eRacks (they make good cheap stuff), except for the hard drive, which I yanked and replaced with a 64MB IDE-flash drive from these guys. Celeron 1.3GHz, 512MB RAM. The system never ever, even during peak times, goes over 10% CPU load.
This isn't quite up to the specs the original author was looking for, mainly because this hardware isn't also doing the T1 stuff. (It's got plain old boring Ethernet to an older Cisco router, to which our four T1s are connected, but the Cisco is basically just a really big media converter.) But given how low the hardware utilization is on this unit, and how underpowered this system is as compared to current hardware, I think it shows that the notion is quite feasible.