Domain: linux-ha.org
Stories and comments across the archive that link to linux-ha.org.
Comments · 72
-
Re:How about reading the announcement first?
Failover is part of High Availabillity and a different concept than beowulf.
-
Re:Why maintain all that SysV cruft?
I really don't understand why every UNIX distribution isn't making these moves.
Some vendors started well before Sun.
IBM comes to mind. What do you think the L in AIX 5L stands for? HP has a different take but an interesting one for HP-UX 11i. Don't forget the company formerly known as SCO has Open Unix with LKP. Honestly I am sure Linux fits in with just about every other Unix vendor these days but you can do your own homework.
You might want to ease up on your Linux horse. I love the OS and spend most of my time there at home but it still has a ways to go before being truly competitive with Unix on the high end. HA clustering, Failover, Common Criteria certification, and widespread SAN vendor support are still lagging.
To pick on just one aspect of your RH tools + Solaris kernel theory, imagine adding heartbeat support to all those tools. Sure it's HA & Linux growing by leaps and bounds but it will take time.
Don't even get me started about CC. -
Re:My big problem with Jabber...:LINUX HA
Check out the linux High Availability project and most specifically their HEARTBEAT software. Basically, if any server running heartbeat goes down in a variety of different ways, a secondary or tertiary or n-ary machine takes over the tasks specified. It can work with any service that you place in
/etc/rc.d so I don't see why Jabber wouldn't be part of that. -
Re:Why wireless?
Doh,
High Availability Linux is the project that really could use a small device with dual NIC's and a serial console that forwards traffic to one of several servers (knowing which are alive and free).
Changing the Wireless card to something else would make it useful for someone else, solid state web-server, bluetooth connecting point, digital camera printer server... -
HA Linux
Take a peek over at linux-ha.org and do some reading.
Distributed filesystems like CODA and/or GFS might go quite a ways to solving some of your problems without resorting to implementing a filesystem on Oracle. In the end, what extra capabilities is Oracle going to give you that the right sort of filesystem wouldn't?
Plus, if your data should be stored in a database, then store it in a database. Don't store database data in a filesystem, and don't store filesystem data in a database. They're two very different ideas for data storage (heirachical vs relational is just for starters).
Perhaps simply sitting back and doing a bit of reorganising and replication (produce some read-only mirrors using rsync, or use the network block device to do some network-raid or something) will solve all your problems. -
Autoconf does help portablity to "normal" OS's.I'm working on Linux HA in their porting (porting to Solaris and *BSD).
Thanks to Auto conf, some really nasty #if's in the code have been removed by a single include line, its also able to simplify code by removing 50 trillion OS specific checks from the source files and only insert it when it needs to.
Once you get over the basic learning curve hurdles, then you should be fine.
It would be nice though if there was a makefile -> autoconf converter (but that is just me).
-
Re:From the thank-you-capt-obvious department....But when it comes to making a mission-critical application, they're not going to allow them to run down to PC Joe's, pick up a $2k box, install a $30 OS and believe it will run 24/7 without failure.
No, they're going to allow you to call up Dell, buy 2 $1.5k boxes, and configure them for high availability. Of course, there's some labor costs involved here to get to Sun-equivalence in terms of guaranteed uptime, but I'm selling my labor, not Sun (or Dell, for that matter) hardware. And hardware + labor still comes in way below Sun.
-
Linux HA project
Linux HA project is completely open for new members, I had very good expirience with them. Also, people who writes good code is always wanted, since project is in production stability, but far from finished. So if you have time check www site.
-
Re:Auto upgradesWell.. If its a portion of a system that you are working on with others then it IS very important to keep up to date.
I am working on the Linux HA Procject by porting it to FreeBSD (and helping out with Solaris as well)... and I need to keep my system totally up to date. -
Multicasting == streaming media
What might be the best multicasting application to use to be able to fully utilize the power of the cluster?
When you think about it, the applications of network multicasting (of content anyway) are pretty straightforward, and pretty much come down to 1-to-many streamed audio/visual content. Most of your ideas (distance learning, conferencing, etc.) assume that you have content to stream. As for distributed file share, etc., those all seem to me to be more applicable to unicast technologies.
If you're interested in multicast technologies, there are some very interesting low bandwidth applications. Heartbeats for distributed system applications are a good example. See the Linux-HA (High Availability) project for an application of this: linux-ha.org.
Invisible Agent -
Re:Not a beowulf cluster
Not really, but combined with the HA stuff at Linux-HA, we can get a load-balanced, HA system. We have a running hack at Virtual Malaysia that does this. Basically, we put a Linux box at the front of a couple of NT web servers with monitoring software to check for downed boxes. We just have to add another box for failover and LVS + HA should be complete.
-
Re:Clustering ain't just BeowulfQuite right. I guess the question asked just isn't specific enough; the setup needed for high availability failover stuff is quite different compared to load balancing / process distribution high performance clusters.
A very good place to start looking at various stuff available for linux clustering is www.linux-ha.org.
Also worth mentioning if you think about the high availability (active/standby) configuration: if there's more than one service to be provided, you can get quite nice performance boosts by distributing active / standby roles on the machines in your cluster - having a database server for an ISP with oracle active on one node and postgres / mysql on the 2nd node gives you both great performance and high availability.
It means an active-standby configuration with a shared disk
Not necessarily; Personaly I like the solution of having seperate, local raid0 (or raid5) disk arrays in each of the nodes and keeping them synchronized over the network.
- You don't need the special hardware for shared disk stuff.
- it's much easier to physicaly seperate the nodes - all that's needed is a reliable network connection between the nodes.
- You avoid the single point of failure you'd get with the shared disk device.
For a practical implementation of disk synchronisation at the blockdevice level have a look at drbd.
If you do want to go with shared media you'd best consider two seperate raid 0 or raid5 devices, each connected to both boxes (seperate scsi bus for each device). The two devices are then configured as raid 0 (mirror); if you throw in some scsi seperatores you should be set - the aim is to avoid the problems arising from a single device rendering the whole scsi bus unusable if it fails in a nasty way.
You'll still want to have some aditional hardware for your cluster: having a good method for I/O fencing (guaranteeing that both nodes trying to write to a device at the same time scrambling the data) is a realy good idea; the easiest way to achive this is to provide a method for one node to controll the others power suply; in case a node decides it has to take over functionality because the previously active node is no longer responding it can power down or at least power cycle the other node to make sure it's REALLY down and not just hung for a few seconds.
Designing and building clusters can be fun :-) - You don't need the special hardware for shared disk stuff.
-
another thing to wantOn the Linux-HA mailing list, I suggested people ask IBM for the Phoenix clustering code. That contains a multi-node membership system, event distribution and a DLM. All that would be very handy for highly-available clusters, and is missing right now.
SGI previously released their FailSafe application monitoring and restart service. Having the Phoenix stuff underneath it and available for the GFS file system, and using the existing linux-ha bits would pretty much be a complete cluster solution. That would be good.
-dB
-
Re:Yay!!!
What about ext3 (which provides journalling) - hasn't that been released yet?
ext3 is available as kernel patches from ftp://ftp.uk.linux.org/pub/linux/sc t/f s/jfs/; there's still a bunch of issues to be aware of.
Pro: very nice transition from existing ext2 filesystems and back again. Does journaling so bye-bye long fsck times.
Con: Does data+metadata journaling so write performance is about 1/2 ext2. Must still be classed as experimental, I wouldn't yet go production with ext3 - reiser seems to be stable enoug to use on production systems right now.
If you're interested in stuff increasing the availability of your system (journaling filesystems, hardware monitoring, cluster configurations..) the site to visit is http://www.linux-ha.org, it's got a nice colection of links to the relevant projects.
-
Dedicated is better, but...
Although I'd agree that a dedicated router box is generally better than running a box with an general-purpose OS, the decision isn't as clear-cut as some posters are making it out to be.
For modern hardware, there's no appreciable performance difference for the kinds of loads most people will see. For one of my clients, I set up dual Celeron-based Linux boxes as routers. One is the active router, and the other is a hot spare, automatically failing over if anything happens to the primary. (Kudos, BTW, to the folks at the High Availability Linux Project.)
This solution happily routes about 15 Mb/s around the clock, and I've tested it up to 100 Mb/s. Total cost for the pair was about $3200 in 1U rackmount cases. I can run all the latest Linux security tools on them. And other Linux sysadmins can work on them without learning, say, Cisco's arcane configuration language.
So a dedicated router may be better on the same hardware, but using a full-blown OS can make a lot of sense. -
Fail-over
In addition to linux-ha, which includes links to Linux Virtual Server, Piranha, Ultramonkey, you can also find organizations that do this for a living. One (the company I work for, to be honest) is Mission Critical Linux. Specify what your needs are, exactly (web service, database failover, file system, etc), then look around.
By the way, is your consultant a reseller of Solaris (since I see he suggested that)?
jeff -
LVS and Linux-HAWe have just been involved in creating a high availability clustered solution using completely linux for a client.
For this we used the Linux Virtual Server Project and also The Linux High Availability project.
This provides a great, resiliant service, the project is live and running like a dream !!!!
Dont believe what you hear from these overpriced consultants.
-
Re:Linux High Availability projectNotice there are many links to related HA items there at the Linux High Availability Project. It sounds as if you're looking for something like FAKE, which lets a machine acquire the IP of another machine in a failure (note that FAKE points out that it has been moved into the "Heartbeat" code at Linux-HA) -- although some link chasing is necessary to learn where it went.
"...dual-port NICs...switch the ports when the active port fails...
Oh, I see. When one port (or its path) fails, you want to switch the IP to a different port? I don't think "the driver" needs to do that, just change the IP assignments with ifconfig.
- Monitor each link with some sort of heartbeat.
- When there is no response, assign the IP of that link to the backup interface. Just use ifconfig to alter the interface configuration.
- Have the backup interface be on the other NIC, not "switch ports" as you mentioned.
- Dual-port NICs are not needed, if you can fit 3-4 NICs in your machine.
- Have heartbeats running on backup and downed interfaces also, to report problems and repairs.
-
Re:Linux High Availability projectNotice there are many links to related HA items there at the Linux High Availability Project. It sounds as if you're looking for something like FAKE, which lets a machine acquire the IP of another machine in a failure (note that FAKE points out that it has been moved into the "Heartbeat" code at Linux-HA) -- although some link chasing is necessary to learn where it went.
"...dual-port NICs...switch the ports when the active port fails...
Oh, I see. When one port (or its path) fails, you want to switch the IP to a different port? I don't think "the driver" needs to do that, just change the IP assignments with ifconfig.
- Monitor each link with some sort of heartbeat.
- When there is no response, assign the IP of that link to the backup interface. Just use ifconfig to alter the interface configuration.
- Have the backup interface be on the other NIC, not "switch ports" as you mentioned.
- Dual-port NICs are not needed, if you can fit 3-4 NICs in your machine.
- Have heartbeats running on backup and downed interfaces also, to report problems and repairs.
-
Linux does support this...
You could go with an expensive commercial solution like BigIP from F5, but those will run you at least $30k or so. You could also use Polyserve Understudy, which does pretty much the same thing only under Linux, and it's only about $400 or so. If you have all this expensive Cisco equipment and a Cat6000, you can run Local Director on that without buying additional hardware.
However, I suggest:
http://www.linuxvirtualserver.org or
http://linux-ha.org or
http://www.eddieware.org
It all depends on your application that you're running. If it's just http, any of these will work, but if it's something else, you're stuck with linux-HA or Linux Virtual Server. Eddie will only do http as far as I know. Plus Eddie uses Erlang, which may affect performance. -
What you need
Everything you need is at High Availability Linux.
I too am/have built a B2B exchange on the linux platform and found JServ to be *INCREDIABLE* at HA/Failover safe features.
As for the 2 network cards for each machine, that too is a *VERY GOOD* thing. It allows you to partition out your network traffic to achieve much better response time. For example our network has 2 NICs in each machine. There is "Web Server to Database" network, There is a firewall to webserver network, and we have a seperate network for office web surfing and misc stuff like that. Access to the "WebServer to Firewall" network is handled across the router.
One thing to keep in mind when dealing with DB aware web applications is that unless your code is *VERY POORLY* written the biggest bottleneck will be in network latentcy.
-Grimsaado -
High-Availability Linux Project
Perhaps I've missed something, but the High-Availability Linux Project (http://www.linux-ha.org) already has similiar goals for clustering and failover.
Wouldn't it be better to put more community effort into a "real" OpenSource (GPL'ed) solution instead of trying to port Irix's existing product and possibly getting a half-baked license?