Slashdot Mirror


Distributed Data Storage on a LAN?

AgentSmith2 asks: "I have 8 computers at my house on a LAN. I make backups of important files, but not very often. If I could create a virtual RAID by storing data on multiple disks on my network I could protect myself from the most common form on data failure - a disk crash. I am looking for a solution that will let me mount the distributed storage as a shared drive on my Windows and Linux computers. Then when data is written, it is redundantly stored on all the machines that I have designated as my virtual RAID. And if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself when I add a replacement system to the virtual RAID. Basically, I'm looking to emulate the features of hi-end RAIDS, but with multiple PCs instead of multiple disks within a single RAID subsystem. Is there any existing technologies that will let me do this?"

79 of 446 comments (clear)

  1. NBD Does this by backtick · · Score: 5, Insightful

    http://nbd.sourceforge.net/

    "Network Block Device (TCP version)

    What is it: With this thing compiled into your kernel, Linux can use a remote server as one of its block devices. Every time the client computer wants to read /dev/nd0, it will send a request to the server via TCP, which will reply with the data requested. This can be used for stations with low disk space (or even diskless - if you boot from floppy) to borrow disk space from other computers. Unlike NFS, it is possible to put any file system on it. But (also unlike NFS), if someone has mounted NBD read/write, you must assure that no one else will have it mounted.

    Limitations:It is impossible to use NBD as root file system, as an user-land program is required to start (but you could get away with initrd; I never tried that). (Patches to change this are welcome.) It also allows you to run read-only block-device in user-land (making server and client physically the same computer, communicating using loopback). Please notice that read-write nbd with client and server on the same machine is bad idea: expect deadlock within seconds (this may vary between kernel versions, maybe on one sunny day it will be even safe?). More generally, it is bad idea to create loop in 'rw mounts graph'. I.e., if machineA is using device from machineB readwrite, it is bad idea to use device on machineB from machineA.

    Read-write nbd with client and server on some machine has rather fundamental problem: when system is short of memory, it tries to write back dirty page. So nbd client asks nbd server to write back data, but as nbd-server is userland process, it may require memory to fullfill the request. That way lies the deadlock.

    Current state: It currently works. Network block device seems to be pretty stable. I originaly thought that it is impossible to swap over TCP. It turned out not to be true - swapping over TCP now works and seems to be deadlock-free.

    If you want swapping to work, first make nbd working. (You'll have to mkswap on server; mkswap tries to fsync which will fail.) Now, you have version which mostly works. Ask me for kreclaimd if you see deadlocks.

    Network block device has been included into standard (Linus') kernel tree in 2.1.101.

    I've successfully ran raid5 and md over nbd. (Pretty recent version is required to do so, however.) "

    1. Re:NBD Does this by dbarclay10 · · Score: 5, Informative

      Just to clarify what this guy is saying:

      1) Make all your machines NBD servers. NBD for Linux, NBD for Windows. NBD stands for "network block device" and allows a client to use a server's block device.
      2) Set up a master client/server (using Linux or something else with a decent software RAID stack). This machine will be the only NBD *client*, and it will use all the NBD block devices exported by the rest of your network.
      3) On the master set up in 2), create a Linux MD RAID array overtop all the NBD devices that are available.
      4) Create a filesystem on the brand-spanking-new multi-machine RAID array.
      5) Export it back to the other machines via Samba or NFS or AFS or what have you.

      Why does only one machine (the "master server") access the NBD devices, you ask? Because for a given block device, there can only be one client accessing it safely. Thus, if you want to make the RAID array available to anything other than the machine which is *running* the array off the NBD devices, you need to use something which allows concurrent access; something like NFS, Samba, or AFS.

      Hope that clears it up a bit.

      --

      Barclay family motto:
      Aut agere aut mori.
      (Either action or death.)
    2. Re:NBD Does this by caluml · · Score: 2, Informative

      Hmm. How stable is it? From /usr/src/linux/Documentation/nbd.txt:

      Note: Network Block Device is now experimental, which approximately
      means, that it works on my computer, and it worked on one of school
      computers.

      That doesn't sound very promising to me. Usually stuff that's been in the kernel since 2.1 days is rock solid.

      Isn't AFS/Coda more like the guy wants (excluding Windows-ability, although I seem to remember there being something for Andrews for Windows)?

    3. Re:NBD Does this by WindBourne · · Score: 2, Interesting

      I currently do this at home with 3 computers (all Linux) for my home directory. But I have been thinking that there needs to be a way to seperate parts of etc for the local system vs. the network. I have been thinking of how to write a block device that allows layers to be combined.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    4. Re:NBD Does this by arivanov · · Score: 2, Insightful

      There ae inherent pitfalls in it. They are mostly similar to the problem of swapping over NFS. It overall boils down to buffer management.

      Basically, in order to execute the network device request you often have to get more memory. In order to get more memory you have to execute a network request. So on so forth.

      Also, AFAIK RAID does not work properly over NBD.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
  2. Win2k by SuiteSisterMary · · Score: 4, Informative

    I believe that Windows 2000's Distributed File System allows you to do just this.

    --
    Vintage computer games and RPG books available. Email me if you're interested.
    1. Re:Win2k by SuiteSisterMary · · Score: 2

      If you look further into DFS, I believe you'll find that you can have multiple servers syncronizing the same share name.

      It's pretty snazzy; it'll even try to figure out the 'closest' server to you at any given time, skip over servers that are down, and so on.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
    2. Re:Win2k by devilspgd · · Score: 2, Interesting

      From my reading of DFS prior to W2K/AD's release, it was mainly built for large mostly static data which needs to be replicated across multiple sites and needs high uptime, but very specifically does not need to be updated frequently.

      The concept of giving all users read/write access was thought up later on and it happens to work, but as you say, if two users update the same file, you may/will lose data.

      --
      Give a man a fish, he'll eat for a day, but teach a man to phish...
  3. rdist would work... by ZenShadow · · Score: 4, Informative

    The obvious answer for this is nbd, as pointed out in another post -- but I would have concerns about speed with that kind of setup. I'd be interested in hearing reports on that.

    But if you don't want to get into nbd, you can tolerate delayed writes to your virtualized disks, and all you want is the network equivalent of RAID level 1, then you could always just set up an rdist script that synchronizes your local data disk with a remote repository (or eight) every so often...

    --ZS

    --
    -- sigs cause cancer.
  4. Standard Linux kernel maybe? by buzzbomb · · Score: 2

    Perhaps multiple files over different networking procotols (SMB for Windows machines, NFS for the Linux machines) mapped to built-in loopback devices (/dev/loX) accessed through built-in md utilizing software RAID5? Heh. It might not be pretty or fast, but it would probably work just fine. It may just give the kernel absolute fits though.

    Anyone tried this?

    1. Re:Standard Linux kernel maybe? by backtick · · Score: 3, Informative

      NBD *is* standard Linux kernel. It's built right in: /usr/src/linux-2.4/Documentation/nbd.txt

      If you're curious about using the enhanced NBD w/ failover and HA, you can read about it at:

      http://www.it.uc3m.es/~ptb/nbd/#How_to_make_ENBD _w ork_with_heartbeat

  5. AFS by Reeses · · Score: 4, Informative

    It's called the Andrew File System.

    http://www.psc.edu/general/filesys/afs/afs.html

    There's another alternative with a different name, but I forget what it's called.

    --
    Reeses
    1. Re:AFS by fireboy1919 · · Score: 4, Interesting

      In my experience, it's one of those "it would be a wonderful thing if it worked."

      It requires it's own partition for each mount of it; you can't just share disks you've already got.

      Setup also takes hours, and it probably won't work the first time. Online documentation is incredibly outdated, which doesn't help matters at all. It also takes a hefty chunk of computer to run it, because it requires a lot of watchdog type programs to fix the frequent corruption that happens to it as you use it.

      The servers time has to be matched exactly, so it's also best if you've got an NTP server running and clients on all the machines.

      It's also about ten times slower than Samba (which you might use instead to share with Windows machines), and it chokes when you try to move/copy/delete large files.

      I tried it for a month before it completely corrupted it's own partition and I switched back to NFS and Samba.

      I can't wait for the day when these problems are but a memory and such a system works flawlessly.

      --
      Mod me down and I will become more powerful than you can possibly imagine!
    2. Re:AFS by Strange+Ranger · · Score: 4, Informative
      from karmak.org
      AFS is based on a distributed file system originally developed under a different name in the mid-1980's at the Information Technology Center of Carnegie-Mellon University (CMU). It was first publically described in a paper in 1985, and soon afterwords was renamed to the "Andrew File System" in honor of the patrons of CMU, Andrew Carnegie and Andrew Mellon. As interest in AFS grew, CMU spawned the Transarc Company to develop and market AFS. Once Transarc was formed and AFS became a product, the "Andrew" was dropped to indicate that AFS had gone beyond the Andrew research project and had become a supported, product quality filesystem. However, there were a number of existing cells that rooted their filesystem as /afs. At the time, changing the root of the filesystem was a non-trivial undertaking. So, to save the early AFS sites from having to rename their filesystem, AFS remained as the name and filesystem root. In the late 1990's Transarc was acquired by IBM, who subsequently re-released AFS under an open source license. This code became the foundation for OpenAFS, which is currently under active development.
      It's still running and running well at CMU (AFAIK - as of late 90's). Every student gets an "Andrew" ID. Actually the very first networked computer I ever logged into (other than dialing a bbs) was a 'node' on Andrew, in 1988. Very very cool at the time, and still is.
      --

      Operator, give me the number for 911!
    3. Re:AFS by pHDNgell · · Score: 2, Insightful

      In my experience, it's one of those "it would be a wonderful thing if it worked."

      I've been using it for years. I've found nothing that works better. I've got ``clients'' that are IRIX, NetBSD, Solaris, SunOS 4, NetBSD, MacOS X and FreeBSD and I use it to serve my web root, home directories, various applications (my mail server etc...) I can't imagine using something else.

      It requires it's own partition for each mount of it; you can't just share disks you've already got.

      This is very misleading. A file server has to have a dedicated partition. Clients need nothing but OpenAFS or similar installed. Mount points are global and management is distributed. Thinking that AFS is anything like NFS would certainly lead to a bad experience. It solves many, many problems with NFS.

      The servers time has to be matched exactly, so it's also best if you've got an NTP server running and clients on all the machines.

      And your AFS server and client comes with them. I can't imagine what the problem would be with having times matched, anyway. I've gone through the horrors of tracking down log entries from systems that didn't have time synchronized. I don't want to do that again.

      It's also about ten times slower than Samba (which you might use instead to share with Windows machines), and it chokes when you try to move/copy/delete large files.

      Slower at what? Access times? Add another server, it's not like you have to tell the clients. Write times? I don't know about that, I wouldn't want to run a database off the thing, but that's not what it's for. I have no idea what you're talking about regarding it choking on large files. I haven't seen that.

      I tried it for a month before it completely corrupted it's own partition and I switched back to NFS and Samba.

      How exactly did it corrupt its own partition? I've never seen such a thing. Perhaps you did something you were not supposed to do (like anything in its own partition).

      I can't wait for the day when these problems are but a memory and such a system works flawlessly.

      There have been some *very* large AFS installations for years (MIT, CMU, etc...). I wouldn't think that would be the case if such problems were common.

      --
      -- The world is watching America, and America is watching TV.
    4. Re:AFS by Umrick · · Score: 3, Informative

      Never mind that AFS has been in production for literally years, serving terabytes of data for 10 thousand + clients (in several installations of AFS).

      The Windows client did have some notable slowness issues, performance with Linux is excellent, and scales much better than NFS. Clients are available for a large number of OSs. Doesn't matter if it's the right time, just A time. So setup NTP on one machine as a primary, and the others can use ntpdate to set time once a day.

      AFS started around 1986 as a commerical offering, IBM made it opensource in 2001. It can be a serious pain to set up at first, documents are indeed very outdated. Other limitations are no support for >2gig files. You can have readonly duplicates of data on multiple machines. Administration can be a dream once it's running.

      You will need to have ext2 partitions available for storage (OpenAFS uses its own transaction system, and you WILL have race conditions if you put it on a journalling filesystem).

      Also note that as of right now, 2.6 kernels are not supported, though 2.4/2.2 are fine.

      www.openafs.org

      CODA which was a start at an open source answer to AFS way back when, has even more out of date documentation, has never been used in production (that I know of), and basically is not nearly as ready for prime time as OpenAFS.

      www.coda.org

  6. Re:NBD Does this - NBD server for windows by flok · · Score: 5, Informative

    And since the guy is also using windows-boxes, an NBD-server for windows can be found here:
    http://www.vanheusden.com/Loose/nbdsrvr/
    This version enables you to also export partitions/disks.

    --

    www.vanheusden.com - home of Multitail, HTTPing, CoffeeSaint, EntropyBroker, rsstail, bsod, listener, nagcon, nagi
  7. Why? by Anonymous Coward · · Score: 2, Funny

    I have 8 computers at my house on a LAN. I make backups of important files, but not very often

    I mean, let's be honest here. We are all dorks, but this guy is king dorkus dweedius maximus. Don't fool yourself about the "important data" - it is just pr0n and pirated MP3s.

    If it was real work, there would be a real IT guy with real RAID and real backup tapes working on the problem,. But we know it isn't real work, because if this guy had a real IT job, h couldn't stand coming home and dealing with 8 friggin computers.

    We realize you think you are cool because you have a few KVMs, a couple of Linksys routers, and a bunch of old PIIs running Lunix with one Windows machine, but come on, man. Stop spanking yourself over your elite NAT-ed network and just get one computer with hardware RAID. Instal Cygwin if you feel the need to type configure && make && make install a whole bunch of times and watch teh pretty text lines scroll.

  8. Most common form of data loss? by Anonymous Coward · · Score: 5, Insightful

    I'd argue the point that the most common form of data loss is a crashed hard disk.

    In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

    But I restore data accidentally deleted or changed by a user at least weekly! A distributed storage system won't help you there.

    However, I will grant that the average /. user knows what they're doing with their data far more than my average user does and is less likely to cause self-inflicted damage.

    1. Re:Most common form of data loss? by Blackknight · · Score: 4, Insightful

      That's one feature from VMS that I wish unix had. File versioning was built in to the file system, so if you wanted the old version of a file back you just had to roll back to the old one.

    2. Re:Most common form of data loss? by ckaminski · · Score: 3, Interesting

      But say I do? I mean, versioning databases are the next bit, man. Why not have a chmod +v for versioning? If this bit is set, then apply version control. Every file open/write/close sequence adds a new version delta. Sure, there's a performance hit associated with it, but I'd like the choice.

      AFAIK, there's at least on project out there to turn CVS into a filesystem, and a few others to add MVCC functionality into a filesystem (somewhat like the Clearcase filesystem does).

      It's a good feature, something I'd want on my docs and code, and other specs, not necessarily on my pr0n and MP3s.

      -Chris

    3. Re:Most common form of data loss? by steveha · · Score: 3, Informative

      0) Mirroring (RAID 1) takes double the disk space; but you could use RAID 5 instead. A 4 disk RAID 5 would take 4/3 as much disk space as you get to use.

      1) You could make a partition that is 10% of your disk, make another identical one on another disk, and mirror those. Then put your 10% critical data in there.

      2) Do what I do: set up a RAID server, and keep all critical data on that. This is good if you have a home network with multiple computers. It also makes data sharing easy among the computers.

      steveha

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    4. Re:Most common form of data loss? by angst_ridden_hipster · · Score: 4, Informative

      As I always chime in at this point:

      Use rdiff-backup!

      http://rdiff-backup.stanford.edu/

      Configurable, secure, distributed, versioning incremental backups.

      It's not a replacement for RAID, but is good for nightly inter-machine backups.

      There's also a related project where the far-end repository is encrypted, so you can have it on any public server without fear of having your data read by the wrong people.

      Very cool. It's saved my ass a few times.

      --
      Eloi, Eloi, lema sabachtani?
      www.fogbound.net
    5. Re:Most common form of data loss? by penguin7of9 · · Score: 2, Interesting

      That's one feature from VMS that I wish unix had.

      That feature doesn't need to be in the kernel, since it can easily and transparently be provided in user space.

      If you like, you can enable this right now using a simple hack on top of PlasticFS or your own, custom LD_PRELOAD hack.

      Providing file versioning in the kernel or enabling it globally in some other form has not caught on because it is a huge hassle and causes lots of problems, even in systems that know about it.

      For example, when you retag one MP3, do you want to keep an old version? What about if you retag your entire 50G collection of MP3s?

      The default of not versioning files in UNIX works better. Versioning and its implementation is highly application and implementation dependent. Emacs, OpenOffice, cvs, and other tools do the right thing, and they do it much better than anything the kernel could ever hope to do.

  9. Intermezzo by mikeee · · Score: 5, Informative

    Intermezzo is designed for this and a bit more - if one of the machines is a laptop you can take it away and work on it, and it'll resync when you get back.

    It isn't particularly high-performance, from what I know, and may be more complexity than you need.

  10. Bandwidth by omega9 · · Score: 3, Insightful

    I hope you're looking at some fast lines to put between those boxen. Even at 100Mb/sec, doing RAID across a LAN could get slow.

    --
    I'm against picketing, but I don't know how to show it.
    1. Re:Bandwidth by twitter · · Score: 2, Insightful
      Bunk. If you can do raid over USB, you can do it over 10/100 ethernet. As long as it's just used for data storage, the loss in speed should be no big deal. Windows, at least, would not notice.

      --

      Friends don't help friends install M$ junk.

  11. RAID on Files by Great_Geek · · Score: 3, Insightful

    I have often wanted the same thing, kind of like RAID on files, call it RARF (Redundant Array of Remote Files). I was thinking along the line of a device driver that presents an ATA/IDE interface to the file system on one side and passes the requests to multiple copies of virtual disks. The virtual disks would be like VMWare disks, and potentially each on a different machine/location. Each virtual disk could even be encrypted differently.

    This would be really useful for SOHO type places to allow me to have a hot offsite backup at multiple friends (and vise versa).

  12. Backing up all within your house by Alain+Williams · · Score: 4, Insightful

    Hmmmm, what happens if your house catches fire ?

    8 copies of the same document all nicely toasted!

    1. Re:Backing up all within your house by feepness · · Score: 2, Funny

      Hmmmm, what happens if your house catches fire ?

      Come on, this'll never happen. I live in San Diego!

    2. Re:Backing up all within your house by Eric+Smith · · Score: 2, Interesting
      Hmmmm, what happens if your house catches fire? 8 copies of the same document all nicely toasted!
      Been there, done that. :-( Didn't even get a t-shirt.
  13. Loose Hard Drive? by Anonymous Coward · · Score: 2, Funny

    As opposed to a tight one?

  14. Speed would be an issue... by Trolling4Dollars · · Score: 4, Informative

    I imagine you'll need gigabit ethernet or multiple NICs in bonded mode. Then you have the performance of each individual system to take into account. Especially if one of the systems is heavily used. I would recommend getting one BIG HONKIN' SERVER and putting it in a central location. Give it gigbit and let everything else connect to it at 100. Then, make sure it has a hardware RAID controller. Use SAMBA for the cross platform connectivity you desire, and viola! protected data with redundancy and high speed performance. If you go with remote display (RDP with Windows Terminal Server or X with *nix) then you have an even better appraoch as all the data will exist on the secure RAID box.

    I get what you mean though... it's a nice idea, but it would be costly to implement vs. what I suggested above.

    When I went to see a presentation on HP's SAN solutions last year, I was very impressed with the ideas they had. One big hardware box with multiple disks that are controlled by the hardware. They are then presented to any systems over a fiber link as any number of drives you wish for any OS. Finally, their "snapshot" ability was pretty impressive. (Also called Business Copy) All they would do is quiesce the data bus, then create a bunch of pointers to the original data. As data is altered on the "copy" (just the pointers, not a real copy), the real data is then copied to the "copy" with changes put in place. I imagein something similar could be accomplished with CVS...

    1. Re:Speed would be an issue... by LookSharp · · Score: 2, Informative

      ...as much as I dislike replying to T4D, he brings up an interesting scenerio to counter your suggestion of using multiple machines.

      I took a spare machine, added a 3ware 6800 ATA RAID controller ($130 on eBay), and installed eight 120GB Maxtor hard drives ($1200 when I bought them last year) and put them in eight Genica hot-swap trays ($60). For about $1500, I now have an 800GB formatted RAID5 array. (Had to throw in a dedicated 400W Antec power supply for HDs.) In a year, two of the drives have flunked, and the replacement drives have rebuilt beautifully.

      If you're going to lose the site, you're going to lose your data in either case. All you protect against with the network situation is the complete loss of one machine. Protect your server as much as possible and put your data on it.

      Just make sure you keep the "most precious" data offsite on tape of a sneaker-net external hard drive, in case the pop-tart that got stuck in your toaster burns down your house. (This apparently happens about 30 times a year, by the way, including one of my co-workers :)

  15. Coda by fmlug.org · · Score: 3, Redundant
    Coda may do what your looking for
    # disconnected operation for mobile computing
    # is freely available under a liberal license
    # high performance through client side persistent caching
    # server replication
    # security model for authentication, encryption and access control
    # continued operation during partial network failures in server network
    # network bandwith adaptation
    # good scalability
    # well defined semantics of sharing, even in the presence of network failures
    More info here http://www.coda.cs.cmu.edu/
    1. Re:Coda by quantum+bit · · Score: 2, Interesting

      If by "high performance through client side persistent caching" you mean "has to copy the entire 300MB video from the server to my local machine before it even starts playing, assuming it doesn't crap out because the default cache size is smaller than that", then yeah, go for it!

      Seriously, I looked into Coda a couple months ago and the design looks really cool, but it just doesn't seem to work very well unless you're only storing tiny text files. It also doesn't scale very well on large servers (i.e. it has a maximum limit on number the of files on each volume). Don't get me wrong, I REALLY wanted to use Coda because I liked the idea of it -- I just wish that it worked better. Ended up going back to NFS (yuck!).

  16. Distributed Network Block Device by JumboMessiah · · Score: 2, Informative

    A perfect solution would be a form of network block device that mounts distributed NBD shares. The Linux DRBD Project has this capability. From their website, "You could see it as a network raid-1".

  17. Try Rsync or DRBD by oscarm · · Score: 4, Informative

    see http://drbd.cubit.at/ DRBD is described as RAID1 over a network.

    "Drbd takes over the data, writes it to the local disk and sends it to the other host. On the other host, it takes it to the disk there."

    Rsync with a cron script would work too. I think there is a recipe in the linux hacks books to do something like what you are looking for: #292.

  18. Venti needs a mention by DrSkwid · · Score: 3, Informative


    http://plan9.bell-labs.com/sys/doc/venti/venti.h tm l

    Abstract

    This paper describes a network storage system, called Venti, intended for archival data. In this system, a unique hash of a block's contents acts as the block identifier for read and write operations. This approach enforces a write-once policy, preventing accidental or malicious destruction of data. In addition, duplicate copies of a block can be coalesced, reducing the consumption of storage and simplifying the implementation of clients. Venti is a building block for constructing a variety of storage applications such as logical backup, physical backup, and snapshot file systems.

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  19. Expensive but reliable solution by onyxruby · · Score: 2, Interesting

    I've been looking into something like this for a little while. What I'd like to do when I have the fundage is get a fileserver/backup box. The ideal is to run 4 160 GB IDE drives in RAID 5. This will give me a bit over 450 GB in usable network storage. I then want to add a pair of 250 GB 5400 drives for backup. I can then set up a the server to backup the data from the raid drives to the backup drives on a daily basis.

    According to pricewatch the 4 160's could be had for around $400 total with about another $400 for the backup. Add a 3ware RAID controller for another $245 bucks and your looking at about $1045 to convert a system into supporting 450 GB of usuable network storage and backup.

    From all indications IDE harddrives are now the cheapest form of backup there is. I've looked at CD, DVD, Tape, but it keeps coming back to IDE hard drives. This is far cheaper than a similiar storage and backup would be on tape.

  20. hyper scsi by blaze-x · · Score: 2, Informative

    from the website:

    HyperSCSI is a networking protocol designed for the transmission of SCSI commands and data across a network. To put this in "ordinary" terms, it can allow one to connect to and use SCSI and SCSI-based devices (like IDE, USB, Fibre Channel) over a network as if it was directly attached locally.

    http://nst.dsi.a-star.edu.sg/mcsa/hyperscsi/

  21. Rsync and Ssh by PureFiction · · Score: 4, Informative

    This is the way I do it, and although a little clunky, it allows me to keep remote backups of certain directories one three different servers.

    First, setup ssh to use pubkey authentication instead of interactive password. You can read the man pages for details but it basically boils down to running keygen on the trusted source:

    ssh-keygen -b 2048 -t dsa -f ~/.ssh/identity

    Then copy|append the newly created ~/.ssh/identity.pub to the remote hosts into their /home/user/.ssh/authorized_keys file.

    Now you can run rsync with ssh as the transport (instead of rsh) by exporting:

    export RSYNC_RSH=ssh or also passing --rsh=ssh on the command line.

    So to sync directories you could use a find command to update regularly:

    while true; do
    find . -follow -cnewer .last-sync | grep '.' 1>/dev/null 2>/dev/null
    if (( $? == 0 )) ; then
    rsync -rz --delete . destination:/some/path/
    touch .last-sync
    fi
    sleep 60
    done

    Obviously this is pretty hackish and could be improved. But the point is that with ssh and rsync you could do automatic mirroring of specific filesystems or directories to remote locations securely.

    1. Re:Rsync and Ssh by adamfranco · · Score: 4, Informative

      Here is a nice page that explains how do do this. Even better, it shows how to do nice incremental backups using only slightly more space than the source (for the differing file versions). This makes for a pretty cheap and easy backup solution.

      --
      "When ideology and theology couple, their offspring are not always bad but they are always blind." -- Bill Moyers
    2. Re:Rsync and Ssh by strudeau · · Score: 2, Informative
      the original poster I think wants something that also works in Windows.

      Rsync and ssh can work with Windows using Cygwin. See this document for example.

  22. Speed by backtick · · Score: 4, Interesting

    Using a pair of Intel EEPro 100's w/ trunking (using both links at the same time on one IP, works w/ a cisco switch), I've gotten over 100 Mb/sec of actual throughput (I think I hit 137 Mbit/sec, peak) out of a box using NBD to create a mirror'd RAID volume over the trunked ports. Now, my actual 'real' data speeds to the file ssystem were about half that (Call it 50-65 Mbit, or 6 to 7.5 MByte/sec), due to mirroring == writing it twice. Still not bad. Yes, the target disks were themselves part of other RAID volumes, for speed :)

  23. You aren't gonna get a real RAID. by PurpleFloyd · · Score: 5, Insightful
    First off, you aren't going to be able to use this like a real RAID array (a drive can die and you keep on working). The latency and bandwidth of any network that could be reasonably implemented in your home is going to prevent your system from acting like a real RAID array.

    Instead of trying to implement a shoestring SAN, go the simple route: throw up a Linux box running Samba for your "backup server;" it doesn't need much horsepower, just fairly fast drives and a network connection. Then schedule copies of your documents and home directories (using a cron-type tool on Linux and XCOPY called by the Task Scheduler on Windows, you should be able to hack something together that copies only changed files) every night at midnight, or some other time when you aren't using your computers. Although you might lose a bit of work if the system goes down, you won't ever lose more than 24 hours' worth.

    If you have more money to blow, then I would suggest that you invest in an honest-to-dog hardware RAID card and some good drives and put them into a server, then do everything across the network (put the /home tree and My Documents folders on the server). You can of course mount the /home directory in Linux via NFS or smbmount, and Group Policy in Windows 2K/XP will allow you to change the location of the My Documents folder to whatever you choose. You might be able to do the same via the System Policy Editor on 9x; it's been a while and I can't find the information after a brief Google.

    To sum up:

    • Don't blow millions on a SAN for your house.
    • Cheap route: cron jobs/Windows task scheduler to copy important folders across the network every night
    • More expensive route: invest in a server with real RAID, then mount your important directories from that.
    --

    That's it. I'm no longer part of Team Sanity.
    1. Re:You aren't gonna get a real RAID. by Cranston+Snord · · Score: 4, Informative

      Instead of xcopy, try RoboCopy, included in the windows NT/2k/xp/2k3 resource kit available here. It gives you almost as much control as rsync, including directory synchronization, touch control, ageing, network failure support, and others. I use this at work to move around copies of live production data to backup servers located offsite via vpn without any issues. More information on syntax can be found here.

      --
      And now for something completely different...a man with three buttocks.
    2. Re:You aren't gonna get a real RAID. by steveha · · Score: 2, Informative

      No need for an "honest-to-dog hardware RAID". Linux software RAID is simply great.

      Set up a server with multiple hard disks in a Linux software RAID, and run Samba and NFS on that. The Linux software RAID HOWTO explains all you need to know.

      steveha

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    3. Re:You aren't gonna get a real RAID. by dbarclay10 · · Score: 2, Informative
      First off, you aren't going to be able to use this like a real RAID array (a drive can die and you keep on working). The latency and bandwidth of any network that could be reasonably implemented in your home is going to prevent your system from acting like a real RAID array.

      I'm currently running some benchmarks on an XFS filesystem built upon a Linux MD RAID1 array, which is in turn built upon a local disk and a remote disk (which is at the end of a switched 100mbit network, the NBD server itself having an 8-year-old drive and a controller which doesn't do DMA).

      [ dbharris@willow: ~/ ]$ cat /proc/mdstat
      Personalities : [raid0] [raid1]
      md1 : active raid1 nbd0[1] dm-5[0]
      1888192 blocks [2/2] [UU]

      It takes approximately 10 minutes for a 1.8G array to sync. That's respectable. It's not blazing fast, but it's respectable.

      The bonnie++ scores are:

      willow,1G,5086,31,4766,2,2873,1,6377,27,8655,2,1 58.7,1,16,878,18,+++++,+++,766,14,880,18,+++++,+++ ,595,13

      Which isn't amazing, but quite respectable, especially given that this type of thing wouldn't be used for mass storage of ISOs or whatever, but used for people's "My Documents" folders and their $HOMEs. Notable that a fully local array I have which is made up with an old SCSI controller and some old SCSI disks is about half this speed as far as the filesystem goes, and about a tenth the speed as far as syncing goes.

      So, I believe that your assertion of "you aren't going to be able to use this like a real RAID array" is quite incorrect. Especially given that my network isn't particularily fast, my NICs aren't particularily fast, and the remote disk I'm using is dog slow. Replace the NICs with parts that aren't pieces of crap, use Gig-E, and use controllers/drives that aren't 7-8 years old, and you'll get very respectable performance - ESPECIALLY given that the intention isn't to store everything on it, just people's individual files.

      P.S.: Yes. I'm repeating myself. I know this. It's deliberate :)

      --

      Barclay family motto:
      Aut agere aut mori.
      (Either action or death.)
    4. Re:You aren't gonna get a real RAID. by darrylo · · Score: 2, Interesting
      Cheap route: cron jobs/Windows task scheduler to copy important folders across the network every night

      Also, for those people concerned about leaving another "backup server" running 24x7, you can make use of the "wake on LAN" capability to do backups (available on many LAN/motherboards). Just wake up (boot) the "backup server", do your backup, and then shut it down. It's way cool to remote-boot home servers.

      Here, the only real issue is the power/thermal cycling of the hard disk once a day (or whatever), which might be a problem since many disks now tend to come with only a one-year warranty. However, this isn't all that different from a regularly-used PC.

  24. You probably don't want to do this. by NerveGas · · Score: 3, Insightful


    Really. If you're on a 100-megabit LAN, that gives you a max of about 10 megaBYTES per second. So, if you have to transmit information to two other computers for every disk write, you're effectively limitting yourself to a maximum of about 5 megabytes/second disk transfer. And that's under GOOD situations. If you're doing random I/O, where the latency will be the determining factor, then take the latency of the hard drives, add in the latency of the networking, and the latency of the software layers, and you're looking at some pretty abysmal performance.

    Using rsync in a cron job will solve your backup problems. In fact, your script can use rsync to do the synchronization, and tar/gzip to archive the backup - giving you "point in time" snapshots for when someone says "I deleted this file 4 days ago, can you get it back?"

    steve

    --
    Oh, you're not stuck, you're just unable to let go of the onion rings.
  25. I can't believe... by wcdw · · Score: 2, Interesting

    ...this question even got asked. Ok, if you *need* to share the same device across machine, something like the network block device can be a real help.

    If all you're worried about is disk failures, mirror each disk locally. Disks are cheap, and real operating systems don't have any trouble with software mirroring.

    Why would you want to make all of your machines suddenly non-functional, just because one of them lost a network card? Or the switch failed? Or ....

    --
    If you're not living on the edge, you're just taking up space!
  26. Re:So... by macshune · · Score: 4, Funny

    Man, if Beowulf was alive today he'd so kick Slashdot's ass. Seriously, this dude killed monsters, saved villages and killed a dragon. He has armor that would make any slashdotter cream their jeans when they look at the armor's tag and it says AC -9. Don't even get me started on the weapons.

    If you were a medieval ass-kicker, would you want your moniker to be the butt of thousands of canned-jokes that weren't even funny to begin with?

    Hmm...that's like a Beowulf cluster of usb thumb drives...

    Yeah. Maybe the cheap super-computer idea Beowulf would find cool, but not the jokes and the impossible-to-Beowulf devices.

    So those jokes aren't funny and probably won't get you (not you in particular, Pingular) modded up. If you want to talk about networked clusters of non-networkable devices, say:

    "That's like a Duke Nukem Forever/Bit Boys graphics card/Mac OS X on a 386 cluster"

    No wait, on second thought, that's not funny either.


  27. Re:Comment by skinny23 · · Score: 2

    We've used something called MirrorFolder to mirror contents of specific folders across a network. It worked fairly nicely and integrated well with Windows Explorer.

    http://www.techsoftpl.com/backup/

  28. I do this.... by CSG_SurferDude · · Score: 3, Funny

    I do this everynight to thousands of machines...

    The software I use is Kazaa-lite.

    Oh, you mean files other than my MP3s/jpegs/mpegs? Sorry, I can't help you there.

    1. Re:I do this.... by Cyno · · Score: 2, Funny

      See, Kazaa is a perfectly legitimate technology, if only the RIAA and MPAA could stop polluting it with their copyrighted commercial garbage.

      I blame Jack Valenti for this whole mess.

  29. Parallel Virtual File System by richoid · · Score: 4, Informative

    http://www.parl.clemson.edu/pvfs/

    "The goal of the Parallel Virtual File System (PVFS) Project is to explore the design, implementation, and uses of parallel I/O. PVFS serves as both a platform for parallel I/O research as well as a production file system for the cluster computing community. PVFS is currently targeted at clusters of workstations, or Beowulfs."

    "In order to provide high-performance access to data stored on the file system by many clients, PVFS spreads data out across multiple cluster nodes, which we call I/O nodes. By spreading data across multiple I/O nodes, applications have multiple paths to data through the network and multiple disks on which data is stored. This eliminates single bottlenecks in the I/O path and thus increases the total potential bandwidth for multiple clients, or aggregate bandwidth."

    Or there are many others to chose from, google for clustered filesystems:

    http://www.yolinux.com/TUTORIALS/LinuxClustersAn dF ileSystems.html

  30. Slow? by cerebralsugar · · Score: 2, Informative

    I certainly would attest that this is a cool idea. I have a few systems at my place and it would be neat to make a single filesystem spanning all the storage on the network.

    However, while small files would be fine, I would think the speed of the network would make for some fairly slow storage on a 100mbit network.

    Add more users saving files across the network to the equation and things would get out of hand fast.

    I guess I would just buy a serial ata raid motherboard (the intel D865GBFLK is one I have been thinking about), and just do 1:1 mirroring. Cheaper than scsi, and pretty darn fast.

    --
    Easy guys, I put my pants on one leg at a time. The difference is after I put on my pants I make gold records!
  31. Raid != Backup by Alan · · Score: 2, Informative

    Don't forget that RAID only protects you from hardware failures, it doesn't prevent you from doing an "rm -rf important_file" :)

    Personally I have a server with a RAID 5 array that is shared via SAMBA to windows and linux clients, which works fine, though I may adjust this if good suggestions are made here. The only real issue would be disk space, and all my computers now have 120G+ hard drives or RAID array....

  32. New kind of network file system needed by rar · · Score: 2, Interesting

    I don't think the RAID algorithm is the right way to syncronize all your data, when applied on the larger scale. I imagine that what a person really want to do is to unify all his accounts, on slow and fast links all over the world, to look like a huge syncronized partition which stores the data throughout the accounts with sufficient redundancy (meaning something like 'keep copies of all data on at least three different locations). I think using RAID for this would give horrible performance and not be nearly flexible enough in how data is distributed through the different locations.

    A new networked file system is needed. I am working on such a solution on my spare time (but it is still in the design phase).

    The main idea is to unify cache and storage. This means that the least used files are deleted when an account is running out of storage, but under the constraint that a mimum number of copies of the files are kept online. (Hence, data will propagate to the nodes that actually use it). Upon a data request the filesystem goes out and fetch the data. Preferably in some P2P-like way where it is fetched simultaniously from all locations that has copies of that data.

    If someone knows a solution that already works something like this, please tell me.

  33. Or try Groove workspace for Windows by AllDigital · · Score: 2, Informative

    Groove workspace if a collaborative environment, but it does have a component that allows you to share an archive of files.

    Worth considering because:
    - Files are encrypted and sent in an encrypted format.
    - Files placed in the shared space are mirrored on all systems that are members of the worspace.
    - The software is free for non-commercial use.
    - Lot's of other interesting features to play with.
    - You can even mirror with a machine accross the Internet.

    Limited by:
    - The speed of your connection.
    - Windows users only.

    Go check it out at http://groove.net/

    Does anyone know if there are efforts in the open source community similar to...or designed to enhance this product?

  34. DRBD does it as well... by Ron+Harwood · · Score: 2, Informative
  35. The obvious solution by swagr · · Score: 3, Funny

    is to use IP over Carrier Pigeon.

    Then the only remaining issue is number of pigeons.

    --

    -... --- .-. . -.. ..--..
  36. Re:Intermezzo by laursen · · Score: 5, Informative
    Intermezzo is designed for this and a bit more - if one of the machines is a laptop you can take it away and work on it, and it'll resync when you get back.

    We have looked at various distributed filesystems for use in a clustered setup of webservers. We wanted to remove the single point of failure from a central NFS server - Intermezzo was one of the filesystems we had a look at.

    The idea behind Intermezzo is fairly simple and the documentation is good. The Intermezzo system looked like an ideal solution for our setup (Coda and OpenAFS are far to complex for use in a distributed filesystem on a closed internal net).

    We tested the system but sadly it's not really production stable and I can't advise that you use it.

    If you are looking for a SAFE solution then Intermezzo is not for you - you will just end up with garbled data, deadlocks and tons of wasted time ...

    My 2 cents.

  37. Re:Intermezzo by laursen · · Score: 2, Informative

    We bought a large Storegatek raid (2 x RAID 5) and used NFS.

    NFS is a proven filesystem and it has been tested for years. It's compatible with all major UNIX flavors and BSD/Linux systems.

  38. Why not use Freenet? by La+Camiseta · · Score: 2, Insightful

    It seems to be a great problem solver for what you're trying to do. First off, on initial start it only connects to computers it knows, or downloads info about a couple of nodes from the main website, but if you were to export your noderef and import it into all of your other systems instead of the default noderefs, then you could have a distributed storage network set up among all of your computers.

    Granted, you'd have to have a bit more storage dedicated than you'll be storing, but if you want every file to have a decent backup, then that's one of the prices you'll have to pay. Also, it's self cleaning when it comes to backups, because it automatically pushes out the old, less requested files in favor of the newer, more requested files.

    Another solution, should your systems be using Linux is maybe something like GNUnet, which is built upon the sharing of files in both a distributed and an anonymous manner.

  39. Re:I used to do this, years ago.. by caluml · · Score: 2, Insightful

    Listen, Sonny Jim. You'll not be getting any mod points from us by bringing up the last contender to Windows, which failed miserably. We're feeling good about ourselves right now, and we don't need bringing down.

  40. Yes. by Ayanami+Rei · · Score: 2, Informative

    Software RAID/LVM can detect which volumes go where by magic numbers written to them when you format them. But you still have to set up all the remote NBDs correctly on a new machine, and you need the old setup file from the old machine that tells it what block devices/partitions to use.

    NOTE!

    You shouldn't leave any NBD-exported volumes on the new master. Make it into a physical, local volume, but reference it in the "same place" in your RAID configuration.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  41. Is speed a factor? by adrianbaugh · · Score: 2, Insightful

    I take it you've thought about speed issues? RAID over a 100mbit link doesn't sound like great fun - leastways I wouldn't put my swap on such a drive :) Gigabit might work though.

    --
    "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
    - JRR Tolkien.
  42. A case for distributed LAN storage by Luminary+Crush · · Score: 2, Insightful

    I understand many of the comments here which say "put in a big honkin' server and hardware RAID". That would be a better solution from a purely 'let's serve files and protect data' standpoint if you can accomodate a single, large server and want the best performance.

    However, I see a use for a network LAN storage system. Every machine these days comes with a 72G drive or larger installed locally, yet we are trained as IT personnel to say 'don't store anything locally, it's not secure or safe, put it on one of our nice big honkin' servers'. Unfortunately, those big servers cost alot of money, often require specific admins (eg SAN experts to deal with the management software, dividing up LUNs, etc), and may involve alot of red tape to justify additional storage allocation for your project.

    What to do with all that local disk space that, if unused as most centralized IT would rather have you do it, would be a vast untapped storage resource?

    The concerns regarding latency are well understood, but this might not be a factor if this LAN storage array was used for 'archive' storage where real-time high speed access isn't the driving factor. A RAID 5 system would be far too fragile, as if two nodes were offline/rebooting the entire network storage LAN would be unavailable. You'd need to have more redundancy than that.

    I could see an interesting application using multiple nodes each contributing disk space to a LAN archive storage array which would be 'written to' and retrieved with similar expectations as writing to a tape drive. The bonus would be that you could work on files in realtime over such a network, just quite slowly (many vendors used to offer archive file systems which worked this way using tape or optical drives as the storage medium - AMASS was one such vendor).

  43. Lustre and PVFS by nagare · · Score: 3, Insightful

    The lustre project (www.lustre.org) is supposedly going to be the end all/be all of distributed parallel file systems, but I believe it is still fairly unstable and not ready for production use. In the meanwhile, the best one out there is PVFS(www.parl.clemson.edu/pvfs/). Fat chance trying to find Windows clients, but you can always re-export it with Samba.

  44. Re:What if one of the nodes goes down? by cbreaker · · Score: 4, Insightful

    What if you reboot one of the NBD servers? While you'll still have access to the data since it's a raid, I would well imagine that you would have to rebuild the entire "disk" once it comes back online.

    Assuming a Raid5 with three nodes, and two go down not at the same moment, will all your data be lost?

    I would think very carefully about these issues before putting all your valuable data on it. RAID isn't really designed for frequently unreliable connections like this. It's meant to prevent data loss if a hard drive crashes, which should be a fairly uncommon thing within a single system.

    --
    - It's not the Macs I hate. It's Digg users. -
  45. Why? by Illbay · · Score: 5, Funny
    ...if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself...

    Why would you want to "loose" one of the disks? Don't you know they're supposed to stay tightly enclosed in their little boxes?

    And why do you think that "loosing" the disk would help the image "automatically reconstruct itself?"

    Actually, if you did that the disk would carom around the room like a very fast, very lethal Frisbee and you would be too busy trying to survive to worry about where your data went!

    Just a thought

    Otherwise, your plan sounds peachy.

    --
    Any technology distinguishable from magic is insufficiently advanced.
  46. Check out HiveCache by Jim+McCoy · · Score: 2, Informative
    HiveCache is a distributed RAID system similar to what you are asking for, albeit one that is pitched to more of the enterprise backup environment than the home user. Strong security, error-correction and data replication, and multi-source data publiication and retrieval to eliminate the network hotspots that might otherwise occur.


    While a pure linux solution seems to score the most points here, this particular one lets you combine your windows, OS X, and linux systems into a single distributed storage mesh. There is safety in numbers, and the more systems you can add to these sort of distributed storage systems the more reliable they become.


    HiveCache is more of a backup solution, but I do know that it is possible to use this with a webDAV front-end for archival storage and other intersting storage possibilities.

  47. Rsync & Rdiff-backup by hrath · · Score: 2, Informative

    Check out http://rdiff-backup.stanford.edu/ for the wonderful rdiff-backup.

    With the combination of rsync, ssh & rdiff-backup I have setup a very reliable incremental network backup infrastructure, allowing me to go back to any previous version of a file.

    regards,

    Heiko

  48. File versioning useful, VMS variant not so sure by kingdon · · Score: 2, Interesting

    The concept of being able to see the previous version sounds good. But on VMS, file versions didn't really achieve this all that well. Classic example: how do you delete a file?

    Try #1:

    DELETE FOO.TXT

    This is really the wrong answer. If you have FOO.TXT;1 and FOO.TXT;2, then this command deletes FOO.TXT;2 and any attempt to access FOO.TXT will get you FOO.TXT;1.

    Try #2:

    DELETE FOO.TXT;*

    This is the common recommendation, but you've now lost the ability to see any of the old versions.

    The GNU file utilities (and emacs and some other GNU programs) have a file versioning scheme which is somewhat similar to VMS but somewhat better. Look at commands like "VERSION_CONTROL=numbered cp foo bar".

    Personally, I usually put things which matter in CVS. With the CVS server in a distant city (at an ISP which provides ssh shell accounts). That gives me off-site backups.

  49. HyperSCSI by Nicson · · Score: 2, Informative

    I'm surprised to see nobody has yet mentioned HyperSCSI, which is:
    - opensource
    - based on raw ethernet (supposedly faster than iSCSI or other TCP/IP-based schemes)
    - has a Win2K client

    Check it out, I've tested and used it since about a year and it works quite well!
    --
    Nicson

  50. Distributed Internet Backup System by trawg · · Score: 2, Informative

    not really relevant, but may still be of interest to some (just sounds so neat): "Since disk drives are cheap, backup should be cheap too. Of course it does not help to mirror your data by adding more disks to your own computer because a fire, flood, power surge, etc. could still wipe out your local data center. Instead, you should give your files to peers (and in return store their files) so that if a catastrophe strikes your area, you can recover data from surviving peers. The Distributed Internet Backup System (DIBS) is designed to implement this vision. "

    http://www.csua.berkeley.edu/~emin/source_code/d ib s/

  51. EtherDrive Storage by web_guy1000 · · Score: 2, Informative

    You might consider EtherDrive storage from www.coraid.com. I use it on Linux with software raid. Works like a champ.