Sharing a SCSI Drive Between Two Boxes Using Linux?

← Back to Stories (view on slashdot.org)

Sharing a SCSI Drive Between Two Boxes Using Linux?

Posted by Cliff on Thursday October 31, 2002 @12:30PM from the those-crazy-ideas-that-MIGHT-proove-useful dept.

yppasswd asks: "I'm looking for a (cheap) solution for filesystem sharing between two linux servers and, since the target is just redundancy, I've come to the following idea: two SCSI controllers, one per machine, with different IDs (say 7 and 6) sharing the same disk. Only one of them would mount the disk, the other is just ready in case of failure. I've googled this around, and I've found many different opinions (Yes, no, perhaps, don't do it or it'll explode,...) but nobody saying 'Ok, I've tried this and here is what happened...'. Suggestions are welcome, but keep in mind that many other solutions (Fiber Channel, SSA, NFS mounts, various network filesystems) were already rejected because they were either too expensive, unreliable or not supported under Linux."

8 of 112 comments (clear)

Min score:

Reason:

Sort:

IEEE 1394a by Hungus · 2002-10-31 12:43 · Score: 2, Interesting

Firewire is both inexpesnive and reliable.

--
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK
my 2 cents by m0rph3us0 · 2002-10-31 12:50 · Score: 2, Interesting

In theory this *SHOULD* work, however where I see it failing is upon switch over, personally I would look at hardware that is less likely to fail, ie. a Sun, IBM, etc solution (yes, its expensive), otherwise, i would just keep a spare computer on hand, when things die on server 1, remove external scsi connector from box A, plug in to box B, mount drive. or possibly rig up an electronic A/B switch that will trip upon signal from machine b, so when machine b can no longer contact machine a, it switches the a/b switch and mounts the drive, keeping them both connected to the chain at once will probably lead to serious problems in the event that machine a comes back or some other weird thing that is bound to happen. I would definately recommend a switching solution to having them both connected to the drive chain.
Cable length. by Trusty+Penfold · 2002-10-31 12:51 · Score: 0, Interesting

You would need to make sure that the 2 cables are exactly the same length. If they aren't then you'll run into two problems.

1) The obvious one; the signals will not arrive at the host computers or the disk at the same time. When the signal is going from the disk to the PC, this may not be a problem. When the signal is going from the PCs to the disk it is. If the difference introduces a delay of more than the time it takes to write 1 byte, then that information will be smeared across 2 bytes on the disk.

If you're saving pictures this will result in a blurred image (This is Joke! It will actually corrupt all files)

2) Less obvious, if the cables are different lengths then the signals may interfere when they meet at the controller. If a peak in the signal interferes with a trough in the other, then this will also result in incorrect data being written to the disk.
We're just not quite there yet by crstophr · 2002-10-31 13:20 · Score: 2, Interesting

Solutions for sharing a disk amongst servers usually entail a SAN or fiber connection to the disks, and some really expensive software (read veritas volume manager and veritas cluster FS) to handle it all.

In the linux world take a look at GFS.

http://www.sistina.com/products_gfs.htm

The hardware they use to make it work will probably support what you're trying to do. Your typical off the shelf (At Frys) SCSI controller won't do the trick.

For what you're trying to do I highly recommend you work out some kind of sync between two networked machines with separate storage. If you're running a database it gets really fun. HINT for MySQL, script the replay of the SQL "update" log on the hot standby machine.

Good luck. My company just spent 150k+ on a sun/veritas solution to do exactly this. Our storage is all SAN.

--Chris
How to do the failover... by Ayanami+Rei · 2002-10-31 14:28 · Score: 3, Interesting

Note: I have never tried this before. Try it on a non-production machine first!!! you have been warned...

On the backup machine, write a script that repeatedly does the following actions:

1) mounts filesystem on shared disk read-only
2) if the mount fails becase of an inconsistency, skip to 9
3) checks the mdate of a file called /.watchdog
4) determines if "too long" a period has gone by since that
time... if not, go to 8
5) remounts the filesystem read-write
6) creates a file called "/.failover"
7) starts the application assuming the other computer has died, stops this script loop
8) umounts the filesystem
9) sleep for a short period of time
10) go back to 1

The main machine does the following things in a loop:
1) Update the date of /.watchdog
2) sleep for a short time (shorter than the one in the above loop)
3) Check for the existance of /.failover. If it exists, panic! This means the other machine decided to take over. Ideally you umount everything EXCEPT that disk and halt.

Now, a better idea might be something like this:
Create a small partition on the disk (1 cylinder) in addition to the shared partition.
Have the main machine write timestamps directly into the partition (date +%s > /dev/hdz3 or something). The backup
machine would read that directly rather than trying to
syncronize on a file (whose mtime will only be updated when
the main machine's buffer cache is flushed to disk).

Also, you may want to consider some way to avoid needing a script loop on the host machine; a custom device driver that fits into Linux's watchdog timer framework is probably better.

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
consistency checks will take too long by Anonymous Coward · 2002-10-31 15:01 · Score: 1, Interesting

This is probably not a good idea. Sure SCSI supports this in theory, however in the real world, where drives fail and servers crash, this will prove to be impractical.

If availability is your goal, this is not the way to get it. Get a good journalling file system, hardware RAID and then just replace a drive if it fails. You will find that bringing your one server back online will be faster than managing the switchover when a primary server fails. If you're running a database, the issue will be even more pronounced as any switchover will require that the server perform a consistency check on the database as well.
no by wotevah · 2002-10-31 16:55 · Score: 2, Interesting

I seriously doubt this. I never heard SCSI was sensitive to cable lengths (within spec of course). The data goes in a buffer anyway, it's not like it's written to the media on the fly.
DRDB network raid system anyone? by synq · 2002-11-01 00:55 · Score: 2, Interesting

I'm building a heartbeat cluster to serve WebGUI pages and files via samba.

This going to be presented at a congress for the Netherlands Network User Group November 13th (a mostly Novell and Microsoft NT association).

I have been looking for a solution to mirror files between the two cluster nodes. SCSI is just too expensive for this, since low cost is one of the requirements. I've been trying to compile DRDB on my gentoo 1.3 systems but the 2.4 kernel isn't supported by the default DRDB distibution yet.

Does anyone know about any other projects like these that actually work?

--
sig not found