Slashdot Mirror


Distributed Data Storage on a LAN?

AgentSmith2 asks: "I have 8 computers at my house on a LAN. I make backups of important files, but not very often. If I could create a virtual RAID by storing data on multiple disks on my network I could protect myself from the most common form on data failure - a disk crash. I am looking for a solution that will let me mount the distributed storage as a shared drive on my Windows and Linux computers. Then when data is written, it is redundantly stored on all the machines that I have designated as my virtual RAID. And if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself when I add a replacement system to the virtual RAID. Basically, I'm looking to emulate the features of hi-end RAIDS, but with multiple PCs instead of multiple disks within a single RAID subsystem. Is there any existing technologies that will let me do this?"

8 of 446 comments (clear)

  1. NBD Does this by backtick · · Score: 5, Insightful

    http://nbd.sourceforge.net/

    "Network Block Device (TCP version)

    What is it: With this thing compiled into your kernel, Linux can use a remote server as one of its block devices. Every time the client computer wants to read /dev/nd0, it will send a request to the server via TCP, which will reply with the data requested. This can be used for stations with low disk space (or even diskless - if you boot from floppy) to borrow disk space from other computers. Unlike NFS, it is possible to put any file system on it. But (also unlike NFS), if someone has mounted NBD read/write, you must assure that no one else will have it mounted.

    Limitations:It is impossible to use NBD as root file system, as an user-land program is required to start (but you could get away with initrd; I never tried that). (Patches to change this are welcome.) It also allows you to run read-only block-device in user-land (making server and client physically the same computer, communicating using loopback). Please notice that read-write nbd with client and server on the same machine is bad idea: expect deadlock within seconds (this may vary between kernel versions, maybe on one sunny day it will be even safe?). More generally, it is bad idea to create loop in 'rw mounts graph'. I.e., if machineA is using device from machineB readwrite, it is bad idea to use device on machineB from machineA.

    Read-write nbd with client and server on some machine has rather fundamental problem: when system is short of memory, it tries to write back dirty page. So nbd client asks nbd server to write back data, but as nbd-server is userland process, it may require memory to fullfill the request. That way lies the deadlock.

    Current state: It currently works. Network block device seems to be pretty stable. I originaly thought that it is impossible to swap over TCP. It turned out not to be true - swapping over TCP now works and seems to be deadlock-free.

    If you want swapping to work, first make nbd working. (You'll have to mkswap on server; mkswap tries to fsync which will fail.) Now, you have version which mostly works. Ask me for kreclaimd if you see deadlocks.

    Network block device has been included into standard (Linus') kernel tree in 2.1.101.

    I've successfully ran raid5 and md over nbd. (Pretty recent version is required to do so, however.) "

    1. Re:NBD Does this by dbarclay10 · · Score: 5, Informative

      Just to clarify what this guy is saying:

      1) Make all your machines NBD servers. NBD for Linux, NBD for Windows. NBD stands for "network block device" and allows a client to use a server's block device.
      2) Set up a master client/server (using Linux or something else with a decent software RAID stack). This machine will be the only NBD *client*, and it will use all the NBD block devices exported by the rest of your network.
      3) On the master set up in 2), create a Linux MD RAID array overtop all the NBD devices that are available.
      4) Create a filesystem on the brand-spanking-new multi-machine RAID array.
      5) Export it back to the other machines via Samba or NFS or AFS or what have you.

      Why does only one machine (the "master server") access the NBD devices, you ask? Because for a given block device, there can only be one client accessing it safely. Thus, if you want to make the RAID array available to anything other than the machine which is *running* the array off the NBD devices, you need to use something which allows concurrent access; something like NFS, Samba, or AFS.

      Hope that clears it up a bit.

      --

      Barclay family motto:
      Aut agere aut mori.
      (Either action or death.)
  2. Re:NBD Does this - NBD server for windows by flok · · Score: 5, Informative

    And since the guy is also using windows-boxes, an NBD-server for windows can be found here:
    http://www.vanheusden.com/Loose/nbdsrvr/
    This version enables you to also export partitions/disks.

    --

    www.vanheusden.com - home of Multitail, HTTPing, CoffeeSaint, EntropyBroker, rsstail, bsod, listener, nagcon, nagi
  3. Most common form of data loss? by Anonymous Coward · · Score: 5, Insightful

    I'd argue the point that the most common form of data loss is a crashed hard disk.

    In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

    But I restore data accidentally deleted or changed by a user at least weekly! A distributed storage system won't help you there.

    However, I will grant that the average /. user knows what they're doing with their data far more than my average user does and is less likely to cause self-inflicted damage.

  4. Intermezzo by mikeee · · Score: 5, Informative

    Intermezzo is designed for this and a bit more - if one of the machines is a laptop you can take it away and work on it, and it'll resync when you get back.

    It isn't particularly high-performance, from what I know, and may be more complexity than you need.

  5. You aren't gonna get a real RAID. by PurpleFloyd · · Score: 5, Insightful
    First off, you aren't going to be able to use this like a real RAID array (a drive can die and you keep on working). The latency and bandwidth of any network that could be reasonably implemented in your home is going to prevent your system from acting like a real RAID array.

    Instead of trying to implement a shoestring SAN, go the simple route: throw up a Linux box running Samba for your "backup server;" it doesn't need much horsepower, just fairly fast drives and a network connection. Then schedule copies of your documents and home directories (using a cron-type tool on Linux and XCOPY called by the Task Scheduler on Windows, you should be able to hack something together that copies only changed files) every night at midnight, or some other time when you aren't using your computers. Although you might lose a bit of work if the system goes down, you won't ever lose more than 24 hours' worth.

    If you have more money to blow, then I would suggest that you invest in an honest-to-dog hardware RAID card and some good drives and put them into a server, then do everything across the network (put the /home tree and My Documents folders on the server). You can of course mount the /home directory in Linux via NFS or smbmount, and Group Policy in Windows 2K/XP will allow you to change the location of the My Documents folder to whatever you choose. You might be able to do the same via the System Policy Editor on 9x; it's been a while and I can't find the information after a brief Google.

    To sum up:

    • Don't blow millions on a SAN for your house.
    • Cheap route: cron jobs/Windows task scheduler to copy important folders across the network every night
    • More expensive route: invest in a server with real RAID, then mount your important directories from that.
    --

    That's it. I'm no longer part of Team Sanity.
  6. Re:Intermezzo by laursen · · Score: 5, Informative
    Intermezzo is designed for this and a bit more - if one of the machines is a laptop you can take it away and work on it, and it'll resync when you get back.

    We have looked at various distributed filesystems for use in a clustered setup of webservers. We wanted to remove the single point of failure from a central NFS server - Intermezzo was one of the filesystems we had a look at.

    The idea behind Intermezzo is fairly simple and the documentation is good. The Intermezzo system looked like an ideal solution for our setup (Coda and OpenAFS are far to complex for use in a distributed filesystem on a closed internal net).

    We tested the system but sadly it's not really production stable and I can't advise that you use it.

    If you are looking for a SAFE solution then Intermezzo is not for you - you will just end up with garbled data, deadlocks and tons of wasted time ...

    My 2 cents.

  7. Why? by Illbay · · Score: 5, Funny
    ...if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself...

    Why would you want to "loose" one of the disks? Don't you know they're supposed to stay tightly enclosed in their little boxes?

    And why do you think that "loosing" the disk would help the image "automatically reconstruct itself?"

    Actually, if you did that the disk would carom around the room like a very fast, very lethal Frisbee and you would be too busy trying to survive to worry about where your data went!

    Just a thought

    Otherwise, your plan sounds peachy.

    --
    Any technology distinguishable from magic is insufficiently advanced.