Slashdot Mirror


Distributed Storage Systems for Linux?

elambrecht asks: "We've got a _lot_ of data we'd like to archive and make sure it is accessible via the web 24/7. We've been using a NetApp for this, but that solution is just waaaay to expensive to scale. We want to move to using a cluster of Linux boxes that redundantly store and serve up the data. What are the best packages out there for this? GFS? MogileFS?"

17 of 52 comments (clear)

  1. Lustre by Yerase · · Score: 5, Informative

    Check out Lustre at http://www.lustre.org/ It's being developed/used by the DOE on alot of Supercomputer Cluster systems, for multi-terabyte storage stuff.

  2. Panasas -- check it out by middlemen · · Score: 3, Informative

    Panasas http://www.panasas.com/products_overview.html has some products which probably fit your requirement of high speed distributed storage.

  3. Our crystal ball is fuzzy! by afabbro · · Score: 4, Insightful
    What kind of idiotic Ask Slashdot is this? All of the important data is missing:
    • What's "a lot"? 1MB is a lot of data if you think about it. When people start talking about "a lot" of data these days, I assume they're meaning hundreds of terabytes. Is that what you mean?
    • What's the budget? What performance do you need? Do you need to back it up? Do you need to replicate it? Your post is sort of like "hi, I have a problem. What is the answer? Thanks!"
    Also, it's "too expensive to scale," my friend. You'd think an "Editor" like Cliffy would fix posts, but he's too lazy.

    If you can afford NetApp, why not keep with NetApp? A bunch of Linux boxes is not a storage solution. Indeed, what does Linux have to do with anything? We're talking storage here. What are you planning to do - put in 200 of them with internal SATA drives? Yeah, that'll be a lot cheaper to maintain...

    I'm not shilling for NetApp, but if you really have "a lot" of data to put "on the web" "24/7" then you need some kind of real storage solution like a NetApp or one of their competitors.

    Now go away and please take Cliff with you.

    --
    Advice: on VPS providers
    1. Re:Our crystal ball is fuzzy! by Punboy · · Score: 2, Insightful

      A bunch of Linux boxes is not a storage solution.

      Hey man, don't tell that to Google.

      --
      If you like what I've said here, and want to read more, go to http://www.krillrblog.com
  4. AFS ?? by forsetti · · Score: 3, Interesting

    How about OpenAFS ? It is sort of like NFS on steroids, with redundancy, scaling, cacheing, Kerberos-based security ... I've just started looking at it myself, but it seems pretty slick.

    --
    10b||~10b -- aah, what a question!
  5. look at the rest NetApp is 4th place by johnjones · · Score: 2, Informative

    NetApp is number four in storage revenue terms, after EMC, HP and IBM

    so go ask them about what you want

    really you can admin your white box's (that become a NAS ) or you can get a NAS

    are you thinking SAN ?

    also talk to Apple they do some nice product as well as SUN

    whats this for large data ?
    video data go talk to SGI and their XFS products

    really it depends on what your doing NetApp is great for company File system of documents but Bad if you want to get the most out of your storeage and you do mostly video/music dont care about snapshots etc....

    regards

    John Jones

  6. Centera by egarland · · Score: 4, Informative

    Get A Centera.

    I'm biased but this is a high level Linux based storage system done right. It's not easy to create a coherent storage system out of lots of separate machines, the software that runs on this cluster does a lot of work. This thing fully redundant with no single point of failure, dynamically expandable without even taking it offline, it scales to 100's of terabytes and manages all that content continuously (scanning for corruption and fixing it, garbage collecting, etc..). The cluster has redundant backend networks and parallel paths everywhere, it even uses reiserfs to store the data. There's a lot of good engineering in this unit and they sell it at a decent price compared to NAS boxes.

    Check it out:
    http://www.emc.com/products/systems/centera.jsp
    I do work for EMC (like I said.. I'm biased) but I don't speak for them, my opinions are my own.

    Storage clustering is simply hard to do while still presenting a low level filesystem interface. Tossing that out and creating file storage as a high level service with a richer interface seems like the right approach to me. Show me a storage clustering solution that doesn't do that and I'll show you something full of bugs, expandability issues, limitations, and pain points.

    --
    set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
    1. Re:Centera by egarland · · Score: 2, Insightful

      But we call the Centerra a "data jail". It's like the roach motel..

      Ug. It's just not true. Most applications that are built to work with Centera include functionality to migrate in/out of the system just like most applications that are built to work with tape can both put data on and get it back. The difference is tape sucks, Centera doesn't.

      It can't scale beyond a 42U rack enclosure.

      Also not true. I have worked extensively with a 3 rack install with about 50tb of data on it. I believe all versions of Centera since the very first are capable of scaling to 4 racks and some are capable of going to 8 racks. Lots of customers have 2 rack installs. Raw storage on the currently shipping nodes is over 1 tb per node and you can put 32 nodes in a rack. Do the math, a 4 rack Centera is quite big even after taking mirroring or CPP into account.

      It's a bunch of little servers striped together to form a big NAS with a metedata controller in the middle.

      No. No No.

      It IS a bunch of little servers but no they are not "striped together", and no they don't form a NAS. There is no "metadata controller" and there certainly isn't one in the middle. It is a storage cluster that has features specifically designed to store fixed content. Centera is not a simple Linux hack to make a bunch of boxes look like a storage cluster. It's a robust, flexible, well thought out piece of clustering software that is built on top of a Linux base.

      Centera hardware is good stuff too. It has redundant externally facing servers (access nodes) so that if one fails, applications can keep working. Both back end switches are linked to every node so everything has redundant data paths. Data is stored in such a way that no data is unavailable if any single node fails or goes offline for any reason.

      It's easy to dismiss Centera because it's so different from the standard storage systems who's basic interfaces really haven't changed in 3+ decades. It's not a block device. It's not a filesystem. It's not a mountable share. It's a storage cluster with functionality specifically designed to manage fixed content. It is accessed only through a client side API that talks to the cluster over IP. It isn't easy to wrap your head around.

      --
      set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
  7. Ask Google by VernonNemitz · · Score: 2, Interesting

    I'm sure they'd be happy to sell you something along the line of serving data....

  8. Clustering filesystems- an overview by houdini_cs · · Score: 4, Informative

    I did some reasearch on clustering filesystems for work a while ago. Here's the Cliffs-notes version:

    GFS High-end, a pain in the ass to set up and run. Wants a RHEL server or two to run. OpenGFS Started as a fork of GFS when the GFS license changed, it has followed a bit of a different path. Not nearly as stable or fast as GFS, but might be there some day. Lustre Lustre should be really nice, but is horrendous to run (at least, that's the word from my friends at Sandia, who know a thing or two about it). General consensus is that you need a full-time staff member just to make it work. If you can afford that, it's a good way to go. PVFS Fast, light-weight, not POSIX-compatible. If your apps don't need the stuff it doesn't do, or you're willing to write some glue code for your app to speak PVFS natively instead of using the FS driver, this is a great way to go. Looks simple to set up (as simple as these things get).
    --
    ^]:wq
  9. Converting extra Windows(tm) workstation space? by Dr.Dubious+DDQ · · Score: 4, Interesting

    A barely-related subject - I've been wondering whether there's some way to collect the unused space on all the Windows workstations around here into a shared space for storage.

    This is purely a speculative exercise, but I keep wondering if some combination of:

    • Every Windows(tm) workstation "shares" an otherwise-empty subdirectory
    • a Linux box creates and uses a "filesystem image" file of some kind ("loopback mount"-style image) stored on each share over SMB/CIFS
    • Linux uses VFS to combine the individual virtual drives into a larger drive (or perhaps two identical-size virtual drives, which are then combined into a single software RAID 1 array?)
    • Linux then shares this Rube-Goldbergian system as a Samba share...

    Yes, I know it's kind of silly, and performance seems like it would be pretty pathetic, but the more I think about it, the more I want to see if I could actually do it (think pretty much the same mindset that the IP-over-carrier-pigeon guys had...)

    Heck, it might conceivably actually WORK for a large-but-infrequently-accessed historical repository or something...

    Or has someone already started some sort of "Virtual ATA-over-ethernet-from-a-file driver for Windows" project and spoiled my fun?...

    1. Re:Converting extra Windows(tm) workstation space? by egarland · · Score: 2, Interesting

      If you want to try building it, I'd suggest you start with a nice high level method of creating linux based filesytems:

      http://perlfs.sourceforge.net/

      Build it first, optimize later.

      FYI.. The multi-threaded filesystem version exists, I just haven't bundled it up pretty for distribution. Now someone needs to create a multi-threaded samba to share it out.

      --
      set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
    2. Re:Converting extra Windows(tm) workstation space? by LuckyStarr · · Score: 2, Informative

      [...], but I haven't done any digging to find out if it's possible to directly create a ('standard') filesystem as an image file. (Hints welcome...)

      Huh? Just run mkfs.whatever on your file. Should work without problems. Your filesystem is as large as it would be on an equally large blockdevice.

      Example:

      $ mkfs.ext3 file
      mke2fs 1.36 (05-Feb-2005)
      file is not a block special device.
      Proceed anyway? (y,n) y

      Filesystem label=
      OS type: Linux
      Block size=1024 (log=0)
      Fragment size=1024 (log=0)
      1784 inodes, 7116 blocks
      355 blocks (4.99%) reserved for the super user
      First data block=1
      Maximum filesystem blocks=7340032
      1 block group
      8192 blocks per group, 8192 fragments per group
      1784 inodes per group

      Writing inode tables: done
      Creating journal (1024 blocks): done
      Writing superblocks and filesystem accounting information: done

      This filesystem will be automatically checked every 38 mounts or
      180 days, whichever comes first. Use tune2fs -c or -i to override.

      --
      Meme of the day: I browse "Disable Sigs: Checked". So should you.
  10. We use OpenAFS by Bamfarooni · · Score: 4, Insightful

    We have about 27TB of data from Mars (and adding another TB per month) that we need to keep online. We have been using netapps, but at ~$25K/TB, plus maintenance (3 years maintenance is about as much as a whole new system) they're just WAY too expensive for data warehousing.

    We've moved to using linux based OpenAFS servers. A high quality 3U box (qsol.com) loaded with 16x 300GB ATA drives costs about $8.5K and provides us about 3.5TB (2 drives for parity, 2 drives for hot-swap). That works out to $2.5K/TB. If your risk tolerance is higher than mine, you can bring that up to $8K/5.5TB, for about $1.5K/TB). We really want 99.999% availability, so just to be safe, we keep a 100% redundent read-only copy on a second machine (AFS supports this beautifully, including automatic fail-over).

    OpenAFS has a couple of features that make it better than NFS (client-side cache, for instance), but it also has a few drawbacks, like no files >2GB.

    1. Re:We use OpenAFS by luizd · · Score: 2, Informative

      Not anymore in OpenAFS 1.3.81.

      Copied from release notes:

      For UNIX, 1.3.81 is the latest version in the 1.4 release cycle. Starting
      in 1.3.70, platforms with pthreads support provide a volserver which like
      the fileserver and butc backup system uses pthreads. Solaris versions 8
      and above, AIX, IRIX, OpenBSD, Darwin, MacOS and Linux clients support
      large (>2gb) files, and provided fileservers have this option enabled.
      HP-UX may also support large files, but has not yet been verified. We hope
      sites which can do so will make use of 1.3.81 on their UNIX platforms and
      provide feedback to help us fix any remaining issues before 1.4 is
      released.

  11. The IBRIX file system is a strong runner for this. by schnook · · Score: 2, Informative

    Check out http://www.ibrix.com/ This is a perfect solution for your requirements. Pixar uses this.

    --
    Every day is Saturday and all the rainbows have silver linings.
  12. aRchive.org by fulldecent · · Score: 2, Informative

    This is the solution archive.org uses.

    http://www.archive.org/web/petabox.php

    They are on the order of petabytes

    --

    -- I was raised on the command line, bitch