Sharing a SCSI Drive Between Two Boxes Using Linux?

← Back to Stories (view on slashdot.org)

Sharing a SCSI Drive Between Two Boxes Using Linux?

Posted by Cliff on Thursday October 31, 2002 @12:30PM from the those-crazy-ideas-that-MIGHT-proove-useful dept.

yppasswd asks: "I'm looking for a (cheap) solution for filesystem sharing between two linux servers and, since the target is just redundancy, I've come to the following idea: two SCSI controllers, one per machine, with different IDs (say 7 and 6) sharing the same disk. Only one of them would mount the disk, the other is just ready in case of failure. I've googled this around, and I've found many different opinions (Yes, no, perhaps, don't do it or it'll explode,...) but nobody saying 'Ok, I've tried this and here is what happened...'. Suggestions are welcome, but keep in mind that many other solutions (Fiber Channel, SSA, NFS mounts, various network filesystems) were already rejected because they were either too expensive, unreliable or not supported under Linux."

1 of 112 comments (clear)

Min score:

Reason:

Sort:

How to do the failover... by Ayanami+Rei · 2002-10-31 14:28 · Score: 3, Interesting

Note: I have never tried this before. Try it on a non-production machine first!!! you have been warned...

On the backup machine, write a script that repeatedly does the following actions:

1) mounts filesystem on shared disk read-only
2) if the mount fails becase of an inconsistency, skip to 9
3) checks the mdate of a file called /.watchdog
4) determines if "too long" a period has gone by since that
time... if not, go to 8
5) remounts the filesystem read-write
6) creates a file called "/.failover"
7) starts the application assuming the other computer has died, stops this script loop
8) umounts the filesystem
9) sleep for a short period of time
10) go back to 1

The main machine does the following things in a loop:
1) Update the date of /.watchdog
2) sleep for a short time (shorter than the one in the above loop)
3) Check for the existance of /.failover. If it exists, panic! This means the other machine decided to take over. Ideally you umount everything EXCEPT that disk and halt.

Now, a better idea might be something like this:
Create a small partition on the disk (1 cylinder) in addition to the shared partition.
Have the main machine write timestamps directly into the partition (date +%s > /dev/hdz3 or something). The backup
machine would read that directly rather than trying to
syncronize on a file (whose mtime will only be updated when
the main machine's buffer cache is flushed to disk).

Also, you may want to consider some way to avoid needing a script loop on the host machine; a custom device driver that fits into Linux's watchdog timer framework is probably better.

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON