Sharing a SCSI Drive Between Two Boxes Using Linux?
yppasswd asks: "I'm looking for a (cheap) solution for filesystem sharing between two linux servers and, since the target is just redundancy, I've come to the following idea: two SCSI controllers, one per machine, with different IDs (say 7 and 6) sharing the same disk. Only one of them would mount the disk, the other is just ready in case of failure. I've googled this around, and I've found many different opinions (Yes, no, perhaps, don't do it or it'll explode,...) but nobody saying 'Ok, I've tried this and here is what happened...'. Suggestions are welcome, but keep in mind that many other solutions (Fiber Channel, SSA, NFS mounts, various network filesystems) were already rejected because they were either too expensive, unreliable or not supported under Linux."
Why two separate boxes? Why not just RAID them? :(
Call me an ignorant but I don't understand.
Basically it's a cable that has two cables spliced together and three connectors. They are generally used to share SCSI devices between machines in a cluster.
Firewire is both inexpesnive and reliable.
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK
In theory this *SHOULD* work, however where I see it failing is upon switch over, personally I would look at hardware that is less likely to fail, ie. a Sun, IBM, etc solution (yes, its expensive), otherwise, i would just keep a spare computer on hand, when things die on server 1, remove external scsi connector from box A, plug in to box B, mount drive. or possibly rig up an electronic A/B switch that will trip upon signal from machine b, so when machine b can no longer contact machine a, it switches the a/b switch and mounts the drive, keeping them both connected to the chain at once will probably lead to serious problems in the event that machine a comes back or some other weird thing that is bound to happen. I would definately recommend a switching solution to having them both connected to the drive chain.
The way the manual reads, it seems it should work in all supported OS's, but I cannot confirm that.
Rob
WebMaster:
BinFeeds
XXX Thumbnailed Image Newsgroups but
As a musician, this is a common practice in my field -- one drive from sounds that the computer can access as well as the computer in the mix.
Under ideal circumstances, SCSI can deal with multiple masters. I believe the SCSI standard allows for a lock on the drive until one is finished and then it releases the lock so that the next machine can acess this. However, in practice most drives don't deal well with this. I've seen good SCSI drives killed because of conflicting signals...all because the musician got impatient waiting for his computer to write a sound file and then letting his sampler pull these sounds.
Again, these are much more plug and play than a Unix box will be, but the idea is still the same. If the SCSI driver on even on box ignores the lock the other master has, you've killed the drive.
Hope that was some help...I shouldn't even talk about drives as I just killed the one for my site. From about 20k of hits a to 0k...all because I screwed up my own backup. Ok, back to trying to recover my Ext2FS partition...
clif
I kind of wonder whether the server or the hard drive is more likely to fail, though.
:-)
The way I see it, the only thing this avoids is kernel failure. If the server fails, you're better off having something to restart it and a single box. If the *disk* fails (IMHO, by far the most likely, unless you're running a pretty flaky bit of server software), you're out of luck either way.
It seems like it might be a better idea to get two drives and one server (or two servers with two drives).
Good to see a good "Ask Slashdot", too.
May we never see th
You have to try it yourself. Scsi supports it, and technically nothing can be scsi compliant if it won't work this way, but in practice... That is something else. I won't be at all surprized if one device fails to work that way, but a different from the smae manufacture does. So test your setup before you go to production.
I've met people who claim to have done this, and even gone so far as half the disk used by one comptuer, half the other (seperate partitions), but those start to get into friend of a friend so I wouldn't put much faith in my claim that it has been done.
Scsi cabling is still some of a black magic, but use good cables, no pig tails, good termination, and you should be fine. There should be no need watch for same length cables, just get the termination right, and follow the rules. Note that I said should, SCSI cables are still mystical enough that I wouldn't call you a fool for following rules that appear technically bogus.
Hook it up. Should work. You could ask Ancot Corporation about this... They sent me a free booklet a while ago. "The basics of SCSI" You may still be able to get one on their website www.ancot.com.
Not all controllers or drives may be very excited about this setup, but I believe the standard says it should work. I know I've read about people doing it before (not sure about OS or hardware tho). Plug and chug. You should be able to find some combination that works, and since you aren't trying to mount at the same time from 2 machines - no problem.
You may even be able to mount different disks to different machines on the same chain - share a scanner, tape drive, cdrom, or Zip drive even. Just give it a shot man....
Solutions for sharing a disk amongst servers usually entail a SAN or fiber connection to the disks, and some really expensive software (read veritas volume manager and veritas cluster FS) to handle it all.
In the linux world take a look at GFS.
http://www.sistina.com/products_gfs.htm
The hardware they use to make it work will probably support what you're trying to do. Your typical off the shelf (At Frys) SCSI controller won't do the trick.
For what you're trying to do I highly recommend you work out some kind of sync between two networked machines with separate storage. If you're running a database it gets really fun. HINT for MySQL, script the replay of the SQL "update" log on the hot standby machine.
Good luck. My company just spent 150k+ on a sun/veritas solution to do exactly this. Our storage is all SAN.
--Chris
I guess I'm saying that I don't see why it wouldn't work on today's GNU/Linux systems.
Prevent email address forgery. Publish SPF records for y
you've given very little background on your setup. where most people would try to spread one computer's data over several drives, you are trying to spread one drive over multiple computers. i have no idea why you would want to do that, but this is what i can offer:
why don't you just find an extra comptuer and make an NFS server? the reason that you are not finding much information on sharing a SCSI drive is that there are a lot of better ways to do it. what sort of speed are you looking for? a 100Mbps network can deliver data comparable to having the drive attached locally, and you won't need an incredibly fast computer to serve it.
Somewhere on this page I have hidden my signature.
The difficulty you will have will be the software. You sound like you're not planning to have the same drive mounted on both systems at the same time, and that's good, and since you're using a Unix it sounds relatively simple to make sure that a drive is fully dismounted from one box before you mount it on the other. But very very bad things happen if, by some chance, both boxes do decide to mount a filesystem at the same time. If you have any sort of automatic failover between systems you have to be really really certain that the other box won't spring back to life and start writing to the filesystem while the other guy has it mounted. Supposedly reliable "failover" systems have this happen all the time if not designed correctly - remember, 99% of your failures will be software failures, not hardware failures, so if you design a hardware failover system without taking into account the flaky custom-written software you're making a mistake.
Yes, it's rare - but very valuable. Where I used to work we had about 4TB of hard disk space. Every disk (there were many - all SCSI, around 10GB each) was double-tailed. This allowed each disk to be connected to two controllers, and then each controller was connected to two mainframes. It's a redundancy thing - you're protecting against disk failure, host failure, and controller failure. For all those screaming NFS - all that does is move the problem. What happens when the hard disk controller in the NFS server dies? This way (ideally) if say somebody spills coffee on a hard disk controller (talk about a PITA), the disks are automatically switched over to the other controller. No down time.
Sounds like you just want to multi-init, I know the Adaptec stuff will let you do this (2940 and above?). Just look on google for multi initiated scsi or this
You're talking about making a shared storage HA-cluster. The company I work for makes software to do exactly that, and for typical applications, you can get it for *FREE*. Go to oss.missioncriticallinux.com and look at kimberlite. It makes sure that only one system is using the disk at a time, and automatically switches to the othe machine when one breaks. It's also well documented, and the engineers that work on are good about responding to questions over e-mail.
If you use debian, installation is as easy as apt-get install kimberlite. If you want to use it as an NFS server, you'll need to buy the commercial version for full support, but it's not very expensive.
Ignore the people in this thread who are talking out of their asses and saying multi-host scsi doesn't work well. They just didn't know how to set it up right or have never actually tried it. It's very common, and people have been using it for decades.
I mean honestly, what is more likely to die, a PC or a hard drive? I don't think this has been thought through all the way. It would at least have to be a shared raid array, not just a single shared drive. Preferably hot-swap.
Actually, what's most likely to fail is the software, but NIC failures, accidental cable pulls, and other hardware failures do happen. The trick is setting it up so there is no single point of failure. There are lots of papers available on the web that describe how to do this, and many of them talk about how to do it cheaply. You can set up two systems with no single point of failure (Redundant shared SCSI driver, host based RAID, dual NICs in each system, remote power control) and automatic failover for under $2500.
Note: I have never tried this before. Try it on a non-production machine first!!! you have been warned...
/.watchdog
/.watchdog /.failover. If it exists, panic! This means the other machine decided to take over. Ideally you umount everything EXCEPT that disk and halt.
/dev/hdz3 or something). The backup
On the backup machine, write a script that repeatedly does the following actions:
1) mounts filesystem on shared disk read-only
2) if the mount fails becase of an inconsistency, skip to 9
3) checks the mdate of a file called
4) determines if "too long" a period has gone by since that
time... if not, go to 8
5) remounts the filesystem read-write
6) creates a file called "/.failover"
7) starts the application assuming the other computer has died, stops this script loop
8) umounts the filesystem
9) sleep for a short period of time
10) go back to 1
The main machine does the following things in a loop:
1) Update the date of
2) sleep for a short time (shorter than the one in the above loop)
3) Check for the existance of
Now, a better idea might be something like this:
Create a small partition on the disk (1 cylinder) in addition to the shared partition.
Have the main machine write timestamps directly into the partition (date +%s >
machine would read that directly rather than trying to
syncronize on a file (whose mtime will only be updated when
the main machine's buffer cache is flushed to disk).
Also, you may want to consider some way to avoid needing a script loop on the host machine; a custom device driver that fits into Linux's watchdog timer framework is probably better.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Dan Lancaster's (of TTL Cookbook fame) column in Computer Shopper? He wrote quite a bit about on-demand publishing back in the 80s & early 90s, and talked quite a bit about "shared SCSI comm" -- basically a SCSI drive connected to both a computer & a printer.
As someone else said, you want to look at "multi-initiator" support. Since there's not much point to using SCSI if you can't interleave requests, your going to be talking about "split transactions" where the initiator arbitrates for the bus, selects a target and sends a command and possibly data (write case) over the bus and then disconnects. Later, the target arbitrates for the bus, selects the initiator (hopefully the same one that sent the request), and sends data (read case) and status back. IIRC, SCSI-I didn't support tagged queing and out of order returns, but later versions do. This has got to be negotiated just like synchronous transfer rate. I can think of lots of ways that this could be screwed up (typically in firmware) and never effect the single initiator case, so as I said, you have to test.
If the drive fully and correctly supports the spec, it should respond correctly to requests from any initiator and keep everything straight when it agrees to handle tagged queing. That means you should be able to use different parts of the disk for a filesystem on each disk, as long as you keep everything straight. You can even have one device write and another read, or use some blocks on the disk to coordinate dynamic sharing, but all of that gets complicated quickly, so unless this is what you really want, it won't be worth it.
A couple of comments implied that some music systems do this sort of thing, maybe between the sound recording system and a computer mixer/processor system. Doing this can't break the drive, but it certainly could hose up the format enough to make it unusable without a reformat (if you break the usage rules, that is).
As to cables and such, SCSI is a bus, although you are allowed short taps from the bus to the drives/controlers (maximum is in the spec). If you have some sort of 'Y' cable that connects a host in two directions, you can't have more than one device inside the host (i.e. no drives inside the case connected to the internal port of the controller), and the internal cable has to be short enough (and of course no termination inside either). External drives and multi-drive modules will almost always have two connections for both ends of the bus, so just chain all the drives together and put the hosts on each end. Now you just have to be sure the total cable lenght is within spec (6 meters, I think).
The final topic is why do it in the first place. Keep in mind that drives and power supplies are your most likely failure points in any case, so you want to mirror, or raid. Mirroring with one drive in each box (or many pairs split between the boxes) would reduce the single points of failure pretty well. You could even have both boxes active and mirroring to different pairs sharing the load until there is a failure, then switch over. Manual switch over is probably safest and cheapest, just shutdown the broken system (If not already hard crashed), and mount the other filesystems on the still working box. If you have confidence in your monitoring system, you could script this on certain events.
It looked like some comments had good links to some multi-initiator stuff, or just google that as suggested (it helps when you know what to ask for), YMMV. Oh, one more thing to worry about: terminator power. Usually the controller supplies it to the bus, but it is very bad for more than one device or initiator to supply it. Of course, you also have to worry about still having it at both ends even if one of the machines is off or dead.
I seriously doubt this. I never heard SCSI was sensitive to cable lengths (within spec of course). The data goes in a buffer anyway, it's not like it's written to the media on the fly.
First off, the hard drive is likely to be the weakest link in your setup; making two separate processors depend on the same drive won't give you a lot of redundancy for your money.
Secondly, when the primary machine goes down, it may take cached disk information with it, so your secondary system will need to perform a fsck before mounting the drive, and the lag time probably won't help your situation.
What I would recommend is two separate systems, each with its own IDE (or SCSI) drive, and a gigabit network adapter in each machine. (I'd recommend using this as a secondary to your uplink ports, to make security easier and keep the bandwidth open.) Have the primary machine mount the secondary's drive over the network and mirror everything as it's written.
This is an old idea. "Poor man's clustering" is what they call it.
;-)
The essential trick that you may not think of yourself is to set the SCSI ID of the 2 SCSI host adapters to _different_ SCSI IDs. Most people forget this. Remember, the PCI SCSI you use takes 1 SCSI ID in the chain, even if it's on the motherboard. So if you connect 2 PC to the same SCSI chain, the ID of each PC's SCSI adapter needs to be different, otherwise it's no different than having two hard disks both set to ID 3.
2nd, make sure you terminate both ends and put both PCs inside the termination.
So your chain should look like this:
T-P7-6-5-4-3-2-1-P0-T
Where T is a Terminator, a number is a SCSI ID, and a P designates the SCSI adapter in a PC.
Good luck, and make sure you have enough goats!
Democracy. Whiskey. Sexy. Pick any two.
This is a troll!!!
(Or the gentleman is painfully ignorant).
Having done this myself in the real world I can say with complete authority that one should definitely use cards which support this configuration (e.g. Adaptec's). The reason being that these cards will actively negotiate which one has access to a given device at any particular time.
If you don't have cards that support this (which I didn't, so I found out the hard way) the SCSI devices will get confused and hang if they're accessed by both cards at the same time. Interestingly enough it did work, I just had to be careful what I did on the two machines.
(Better just to get the right cards and not have to worry about it constantly).
Hooking up SCSI devices twin-tailed to a pair of servers is not exactly rocket science, it's done every day. But if you just do that, all it's good for is backup servers connected to the same disk.
:-)
Keep in mind that although the electrical connections are OK (so long as only one thing is talking at a time on the SCSI bus), the filesystem is a different matter entirely: Without some sort of distributed lock manager, your data WILL get horked. Generally DLMs are part of larger packages like GFS, AFS/DFS, Coda, or Veritas ClusterFS. Tivoli's SANergy is probably the closest thing to a standalone product to do this, although there are others - I haven't looked a the market in nearly a year.
Filesystem consistency may be a serious enough problem to keep this approach from even being valuable for backup servers: If one server goes down unexpectedly, it leaves the disk in a corrupted state, which must first be fixed with fsck or the like. If you have ot wait for that anyway, then there's not a whole lot of advantage to all that extra cabling and the weirness that accompanies SCSI length.
Generally, the three best solutions today for this sort of thing are 1) Cheap, easy: to use external RAID boxes and just switch then over physically to a backup server, if required, 2) to use iSCSI or other Storage over IP (SoIP) (or NAS, if you don't need performance) to allow disks to be easily reconnected, or 3) buy a fully virtualized SAN-type solution (which ay be SCSI, Fibre Channel , or SoIP) that will allow you to re-connect everything in software - some of these can work with distributed lock managers.
If you really want to do this sort of thing, do it right: check out FalconStor or DataCore, or HPAQ's VaporStor, I mean, VersaStor...
"The future's good and the present is nothing to sneeze at." - Roblimo's last
Wow, the book is up to the fifth edition -- I have the second (IIRC) edition around somewhere. Plain blue cover...
May we never see th
In our case, we used Fibre Channel, but SCSI doesn't see anything interesting about controller vs device, so you should be able to have multiple machines connected to one SCSI chain. Machines at the end of a chain should be properly terminated.
We also used 'canned' failover software It basically had a committed channel between the two boxes where they talked to each other and fibured out who was up and who was was 'active' (kinda like the protocol used by timed (( BSD protocol before ntpd)). If the 'active' box died, then the backup box would take over as the server -- this included stealing the MAC and IP addresses and the disks.
Obviously, if the backup machine thought that the primary was dead when it wasn't, then all hell would break loose (yes, I had it happen to me).
Should you accept this mission, a journaling FS is obviously the better idea (faster FSCK before restarting the disks). -- and you REALLY want to make sure that the other machine is really down before the backup system grabs hold of the disks. IMHO, you're better off to err on the side of caution... Far easier to recover from the backup machine backing off from failover than trying to figure out what got destroyed by both machines writing to the same disks.
My best suggestion is to find some hardware hack to allow the two machines to pull each other's reset lines low. That way you can avoid the pathalogical case where the primary machine stalls long enough for the secondary to think it's dead, then coming to life thinking that it's still primary (zombie servers -- appropriate for halloween night, don't you think?)...... Instant toasted disks.
Beyond making sure you don't end up with zombie servers, there shouldn't be anything special for Linux to do... Just FSCK the disks and mount them.
OS Software is like love: The best way to make it grow is to give it away.
Consider a situation where you have (a crude ASCII graph slashdot's lameness filter does not let me pass thru, depicting ~)
where n,m,o,p,q are integers bigger than one.
Each of the above is independently connected to each device in the next group.
Now take away any all but one machine from each group (stupid luser access, sudden administrator movement, coffee pourance, spontaneous smoke escapitation event, divide intervention, anything you can come up with as long as it is considered a Fatal Failure on behalf of the conserned device). Does the system fail?
(Examining other setups of similar reliability is left as an exercise for the reader, except for that one who's already fed up with my style of writing.)
This is what the original question is about. I find it quite interesting that such a setup apparently could be achieved with commodity hardware and Free software.
Of Course you need off-this-machinery-and-rather-off-the-continent backup. It without the former, however, does not HA make.
I think, therefore thoughts exist. Ego is just an impression.
I'm looking for a (cheap) solution for filesystem sharing between two linux servers and, since the target is just redundancy, I've come to the following idea
Before you spend a single dollar, ask yourself: if your system is important enough to require fault tolerance, why can't you spend money to get a professional solution? If your system isn't important enough to spend money on, then ordinary bidirectional file replication should be good enough for you. You could do it with rsync and ntpd in a few minutes, for free.
I'm building a heartbeat cluster to serve WebGUI pages and files via samba.
This going to be presented at a congress for the Netherlands Network User Group November 13th (a mostly Novell and Microsoft NT association).
I have been looking for a solution to mirror files between the two cluster nodes. SCSI is just too expensive for this, since low cost is one of the requirements. I've been trying to compile DRDB on my gentoo 1.3 systems but the 2.4 kernel isn't supported by the default DRDB distibution yet.
Does anyone know about any other projects like these that actually work?
sig not found
If you're going to share a disk/fs between multiple machines, you will need a filesystem capable of performing proper file locking in order to avoid data corruption and race conditions.
Global File System (aka GFS) can do this. I believe that it was originally developed under a OSS license, but eventually went commercial. There's rumors of a GNU/GPL GFS (called OpenGFS) but I don't have many details as to the maturity of the project, or any experience with it at all.
I found GFS's learning curve to be pretty steep, but if I was able to set it up, I'm sure that you can work through it.
Lastly, I have only used GFS with a SAN cluster, connecting multiple machines via fabric fibre channel (you might want to consider into using a third box as a RAID host). I know that you are using a very different solution than I did, on a different budget -- so YMMV.
I hope that this is helpful to you.
-Turkey
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
Im currious as to why NFS was rejected, as it is supported in Linux, doesnt cost anything since it comes with Linux, and doesnt seem to have any issues other than security with RPC. We used to have a system with an NFS drive that about 4 people would play music off of the drive and didnt have any performance issues....
We did this with what was then a huge SCSI disk storage (1.2 Gb shared between PCs in 1987 - just before the first 386 PC's came out) and SCSI supported it then. But SCSI won't protect you from conflicting update problems, so unless your OS disk sub-system understands what's going on, you'll have to use some discipline to make sure only one host is writing to the disk at once. You say the other machine is just for failover, so I'd suggest you tell the "failover" machine to mount the drive read-only, and then unmount and re-mount it RW only when you're actually failing-over.
Or, if you're after a cheap solution for failover (and it sounds like you'll be doing a manual failover) I'd just use external devices plugged into a SCSI card, and if you need to failover, manually unplug the disk from one machine and attach it to the other and boot it up. Not quite "hot standby", but quite warm...
I spent a lot of money on booze, birds and fast cars. The rest I just squandered. - George Best
Basically the servers monitor each other, and if the server that has mounted the drive goes down, the 2nd server picks it back up. (Oh, you only have to buy one server. Mirroring licenses are built into the product)
We streamed a video off the disk, then downed the server, and after a couple seconds the video picked right up where it had paused..Very cool.
Of course, that's actually while working on a third workstation....
I know this isn't helpful to the topic (Linux solution needed), but many people don't know it's possible.
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
Dammit! People need to stop ignoring Novell.
Building a Poor Man's SCSI-Based Cluster Hardware System
There's much more information buried on their site, of course it applies to NetWare, but just because you don't have a Linux answer, doesn't mean it doesn't exist at all.
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
Get yourself a copy of Red Hat Advanced Server 2.1 or Kimberlite from Mission Critical Linux. At least read the clustering whitepapers on these two sites!
Imagine the following scenario--
- the node "owning" the disk hangs
- the backup node takes over the connection and starts working
To finish the comment, imagine the following scenario:
- node A, which owns the drive, hangs
- node B takes over and starts writing
- node A recovers from its hang, thinks it still owns the disk, and goes back to writing happily
- you experience massive data corruption and get fired
- the economy sucks so you can't find a new job with "corrupted critical files because of my cheap-ass attempt to share a SCSI drive" on your resume
- your significant other leaves you, because he/she doesn't want to date an unemployed loser
- you spend the rest of your life alone and friendless, wishing you had heeded my sage advice
Clustering packages typically include features to STONITH (Shoot The Other Node In The Head) to prevent problems like this one. Red Hat Advanced Server and Kimberlite (on which RHAS clustering is based) include a number of other nice manageability features as well.
Good luck!
--JRZ
This one is an extreme case in point.
This is NOT off-topic, nor is it flame bait. Too many "ask slashdot" topics are themselves redundant.
Also - horking ugly - ugly enough to make you want to puke.
Been used in Canada for over 40 years.